Documentation
¶
Overview ¶
Package strsplit solves the common text-wrapping problem of splitting strings into bounded-size chunks without breaking Unicode characters and while trying to keep human-readable boundaries (spaces, punctuation, and line breaks).
Problem ¶
Naive string slicing by byte index is unsafe for UTF-8 text and can split a multi-byte rune in the middle, producing invalid output. Even when output stays valid, hard cuts in the middle of words make messages difficult to read (for example in chat payload limits, SMS segmentation, logs, or fixed-size transport frames).
strsplit provides chunking helpers that are UTF-8 aware and separator-aware, so chunks stay valid and readable.
How It Works ¶
The package exposes two functions:
- Chunk: splits a full text block, prioritizing newline boundaries first, trimming whitespace per line, then delegating long lines to ChunkLine.
- ChunkLine: splits a single line by maximum byte size, ensuring the split point is at a rune boundary and preferring the closest separator before the limit.
Separator preference order in ChunkLine:
- Unicode whitespace.
- Unicode punctuation (kept with the preceding chunk).
- Hard UTF-8-safe cut when no separator exists.
Both functions support an optional chunk limit `n`:
- `n > 0`: return at most `n` chunks.
- `n < 0`: unlimited chunks.
- `n == 0`: return nil.
Key Features ¶
- UTF-8 safety: never cuts in the middle of a multi-byte rune.
- Readability-aware splitting: prefers spaces and punctuation over arbitrary byte boundaries.
- Newline-first semantics in Chunk: preserves natural paragraph structure.
- Bounded output control via `n`, useful for APIs with strict item limits.
- Deterministic trimming of leading/trailing whitespace in produced chunks.
Usage ¶
chunks := strsplit.Chunk(text, 280, -1) // split full text block lineParts := strsplit.ChunkLine(line, 64, 3) // at most 3 chunks
This package is ideal for any Go application that needs robust, Unicode-aware message segmentation under byte-size constraints.
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Chunk ¶
Chunk splits text block into substrings of max size at newline/separator boundaries, trimming whitespace and returning at most n chunks.
Example ¶
package main
import (
"fmt"
"github.com/tecnickcom/gogen/pkg/strsplit"
)
func main() {
str := "helloworld\nbellaciao"
d := strsplit.Chunk(str, 5, 3)
fmt.Println(d)
}
Output: [hello world bella]
func ChunkLine ¶
ChunkLine splits single line into substrings of max byte size at UTF-8 boundaries, preferring whitespace/punctuation separators; returns at most n chunks.
Example ¶
package main
import (
"fmt"
"github.com/tecnickcom/gogen/pkg/strsplit"
)
func main() {
str := "hello,world"
d := strsplit.ChunkLine(str, 8, -1)
fmt.Println(d)
}
Output: [hello, world]
Types ¶
This section is empty.