Skip to content
#

chunking-algorithm

Here are 30 public repositories matching this topic...

Open-source toolkit for reliable RAG pipelines: convert PDFs to Markdown, clean documents, inspect chunks, compare chunking strategies, and enrich metadata for LLM applications.

  • Updated Jun 6, 2026
  • Python

SmartChunk is a lightweight, structure-aware semantic chunking toolkit designed to supercharge RAG (Retrieval-Augmented Generation) and LLM pipelines. Unlike naive splitters that break text arbitrarily, SmartChunk respects document structure (headings, lists, tables, code blocks) and semantic flow, ensuring cleaner, more coherent chunks.

  • Updated Feb 6, 2026
  • Python

Implementation of an interactive chatbot for summarizing legal and policy documents. Includes data preprocessing (cleaning, tokenization, hierarchical chunking), extractive TF-IDF baselines, and fine-tuned abstractive models (DistilBART, LED). Integrates a retrieval layer for document relevance and uses ROUGE, BLEU, and cosine similarity metrics.

  • Updated Jan 16, 2026
  • Jupyter Notebook

Cross-industry RAG benchmarking platform, 10 chunking strategies, 3 retrieval modes (BM25/Vector/Hybrid RRF), 5-node LangGraph agent, NDCG@K · MRR · MAP · Ragas · adversarial robustness evaluation across healthcare, finance & legal domains

  • Updated Jun 29, 2026
  • Python

Improve this page

Add a description, image, and links to the chunking-algorithm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the chunking-algorithm topic, visit your repo's landing page and select "manage topics."

Learn more