chunking-algorithm

Here are 30 public repositories matching this topic...

feyninc / chonkie

🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines

ai splitting-algorithms similarity-search chunker rag retrieval-systems chunking-algorithm text-splitter llms chonkie semantic-chunker

Updated Jun 29, 2026
Python

feyninc / chonkiejs

Sponsor

Star

🦛 CHONK your texts with Chonkie ✨ Type-friendly, light-weight, fast and super-simple chunking library

typescript ai splitting-algorithms chunker rag retrieval-systems chunking-algorithm text-splitter llms chonkie semantic-chunker

Updated Jun 18, 2026
TypeScript

nlfiedler / fastcdc-rs

Star

FastCDC implementation in Rust

rust deduplication chunking-algorithm

Updated Jun 24, 2026
Rust

GiovanniPasq / chunky

Star

Open-source toolkit for reliable RAG pipelines: convert PDFs to Markdown, clean documents, inspect chunks, compare chunking strategies, and enrich metadata for LLM applications.

Updated Jun 6, 2026
Python

iscc / fastcdc-py

Star

FastCDC implementation in Python https://pypi.org/project/fastcdc/

python chunking deduplication content-dependent chunking-algorithm

Updated Jun 27, 2024
Python

SmartChunk is a lightweight, structure-aware semantic chunking toolkit designed to supercharge RAG (Retrieval-Augmented Generation) and LLM pipelines. Unlike naive splitters that break text arbitrarily, SmartChunk respects document structure (headings, lists, tables, code blocks) and semantic flow, ensuring cleaner, more coherent chunks.

nlp cli package semantic pip chunking rag chunking-algorithm llm agentic-workflow

Updated Feb 6, 2026
Python

gidea / chunkpad

Star

Chunkpad is designed to prepare documents for Retrieval-Augmented Generation (RAG) pipelines and AI applications.

text-editor chunking-algorithm vector-database rag-pipeline

Updated Nov 30, 2025
TypeScript

mg98 / ae-chunker-go

Star

Go implementation of the AE chunking algorithm.

go golang chunking chunking-algorithm

Updated Jan 4, 2023
Go

FastPix / android-uploads-sdk

Star

Android Resumable Uploads SDK from Fastpix

android kotlin java retrofit2 resumable-upload chunking-algorithm

Updated Jun 16, 2026
Kotlin

Fallen-Breath / pyfastcdc

Star

A high-performance FastCDC 2020 implementation written in Python + Cython

python deduplication chunking-algorithm fastcdc

Updated Jun 29, 2026
Python

arcadiasofts / clast-rs

Star

A Rust library for Content-Defined Chunking (CDC).

rust-library chunking-algorithm content-defined-chunking

Updated Jan 6, 2026
Rust

isaka-james / chunks-to-file

Star

A nodejs chunking system

nodejs chunk chunking chunked-uploads chunks chunking-algorithm chunking-files nodejs-chunking node-chunking

Updated Sep 26, 2024
JavaScript

mahnoorsheikh16 / NLP-Framework-for-Literature-Summarization-in-Law-and-Policy

Star

Implementation of an interactive chatbot for summarizing legal and policy documents. Includes data preprocessing (cleaning, tokenization, hierarchical chunking), extractive TF-IDF baselines, and fine-tuned abstractive models (DistilBART, LED). Integrates a retrieval layer for document relevance and uses ROUGE, BLEU, and cosine similarity metrics.

led text-summarization cosine-similarity rouge-metric nlp-keywords-extraction policy-analysis tokenization bleu-score encoder-decoder-model retrieval-chatbot rag chunking-algorithm longformer-models distilbart rag-chatbot qa-reterival

Updated Jan 16, 2026
Jupyter Notebook

i5heu / ChunkingChampions

Star

Explore and benchmark the world of data chunking algorithms in 'ChunkingChampions' - a competitive arena to determine the most efficient and effective chunking strategies for varied data sizes.

benchmark ranking chunking chunking-algorithm

Updated Apr 6, 2024

Pavansomisetty21 / Chunking-Strategies

Sponsor

Star

Detailed overview on chunking

chunk chunking chunks chunking-algorithm

Updated Sep 26, 2024

D-X-W-Clerker / clerker-ai

Star

[2024-2] AI 기반 회의 지원 플랫폼 서비스 "Clerker"

ai deep-learning summarization stt chunking-algorithm llm

Updated Nov 25, 2024
Python

sanbaiw / semtxtsplitter

Star

A smol Go package for splitting text into chunks while preserving semantic meaning.

nlp rag chunking-algorithm

Updated Apr 28, 2025
Go

Ayan113 / Documind.AI

Star

Documind.AI is a full-stack Generative AI application that allows users to interact with PDF documents through a chat-based interface.

chunking-algorithm vector-database llms rag-pipeline

Updated Mar 23, 2026
JavaScript

AasthaPJoshi / RetrievalLab

Star

Cross-industry RAG benchmarking platform, 10 chunking strategies, 3 retrieval modes (BM25/Vector/Hybrid RRF), 5-node LangGraph agent, NDCG@K · MRR · MAP · Ragas · adversarial robustness evaluation across healthcare, finance & legal domains

Updated Jun 29, 2026
Python

jairamshegde / My-Awesome-RAG-Reads

Star

This is my curated repo regarding Retrieval Augmented Generation(RAG).

embeddings rrf chunking-algorithm vector-database hybrid-search rag-chatbot agentic-rag

Updated Feb 2, 2026

Improve this page

Add a description, image, and links to the chunking-algorithm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the chunking-algorithm topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chunking-algorithm

Here are 30 public repositories matching this topic...

feyninc / chonkie

feyninc / chonkiejs

nlfiedler / fastcdc-rs

GiovanniPasq / chunky

iscc / fastcdc-py

ayush585 / SmartChunk

gidea / chunkpad

mg98 / ae-chunker-go

FastPix / android-uploads-sdk

Fallen-Breath / pyfastcdc

arcadiasofts / clast-rs

isaka-james / chunks-to-file

mahnoorsheikh16 / NLP-Framework-for-Literature-Summarization-in-Law-and-Policy

i5heu / ChunkingChampions

Pavansomisetty21 / Chunking-Strategies

D-X-W-Clerker / clerker-ai

sanbaiw / semtxtsplitter

Ayan113 / Documind.AI

AasthaPJoshi / RetrievalLab

jairamshegde / My-Awesome-RAG-Reads

Improve this page

Add this topic to your repo