Name	Name	Last commit message	Last commit date
parent directory ..
__tests__	__tests__
src	src
.npmignore	.npmignore
CHANGELOG.md	CHANGELOG.md
LICENSE	LICENSE
README.md	README.md
nodemon.json	nodemon.json
package.json	package.json
tsconfig.json	tsconfig.json
tsup.config.ts	tsup.config.ts

Name

Last commit message

Last commit date

`@jscpd/tokenizer`

Tokenizer package for @jscpd — converts source code into a list of tokens for duplicate detection.

Supports 223 programming languages and formats via a self-contained reprism-based grammar engine. Grammars are loaded lazily for fast startup, with O(n) hot paths for high-throughput scanning.

Special tokenization modes handle multi-language files:

Vue SFC (.vue) — <template>, <script>, and <style> blocks each tokenized by their own language
Svelte (.svelte) — per-block tokenization for HTML, JS, and CSS sections
Astro (.astro) — frontmatter and template blocks tokenized independently
Markdown (.md) — fenced code blocks tokenized by the declared language

This enables cross-format clone detection: a <script lang="ts"> block in a .vue file can match a plain .ts file.

Installation

npm install @jscpd/tokenizer --save

Usage

import { IOptions, ITokensMap } from '@jscpd/core';
import { Tokenizer } from '@jscpd/tokenizer';

const tokenizer = new Tokenizer();
const options: IOptions = {};

const maps: ITokensMap[] = tokenizer.generateMaps('source_id', 'let a = "11"', 'javascript', options);

Supported formats

The full list of 223 supported formats is available in FORMATS.md at the repository root, or at runtime:

jscpd --list

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

`@jscpd/tokenizer`

Installation

Usage

Supported formats

License

Uh oh!

FilesExpand file tree

tokenizer

Directory actions

More options

Directory actions

More options

Latest commit

History

tokenizer

Folders and files

parent directory

README.md

@jscpd/tokenizer

Installation

Usage

Supported formats

License

`@jscpd/tokenizer`