Analyse the programming languages used in a folder or from raw content, using the same rules that GitHub Linguist does.
Analyses the languages of all files in a given folder or folders and collates the results.
Powered by github-linguist, although it doesn't need to be installed.
Node.js must be installed to be able to use LinguistJS.
LinguistJS is available on npm as linguist-js.
Install locally using npm install linguist-js and import it into your code like so:
const linguist = require('linguist-js');
Or install globally using npm install -g linguist-js and run using the CLI command linguist or linguist-js.
linguist --help
linguist-js --help
LinguistJS analyses a folder, or a dictionary of already-read file content, and determines what programming languages are used within.
As an example, take the following file structure:
/
| src
| | cli.js 1kB
| | index.ts 2kB
| info.md 3kB
| no-lang 10B
This may be represented in object form as a mapping of file path to file content.
Running LinguistJS on this will return the following JSON:
{
"files": {
"count": 4,
"bytes": 6010,
"lines": { "total": 150, "content": 75 },
"results": {
"/info.md": "Markdown",
"/no-lang": null,
"/src/cli.js": "JavaScript",
"/src/index.ts": "TypeScript"
}
},
"languages": {
"count": 3,
"bytes": 6000,
"lines": { "total": 147, "content": 74 },
"results": {
"Markdown": { "count": 1, "bytes": 3000, "lines": { "total": 10, "content": 5 } },
"JavaScript": { "count": 1, "bytes": 1000, "lines": { "total": 46, "content": 23 } },
"TypeScript": { "count": 1, "bytes": 2000, "lines": { "total": 91, "content": 46 } }
}
},
"unknown": {
"count": 1,
"bytes": 10,
"lines": { "total": 3, "content": 1 },
"extensions": {},
"filenames": { "no-lang": 10 }
},
"repository": {
"Markdown": { "type": "prose", "color": "#083fa1" },
"JavaScript": { "type": "programming", "color": "#f1e05a" },
"TypeScript": { "type": "programming", "color": "#3178c6" }
}
}
github-linguist.
This data is subject to change at any time and may change the results of a run even when using the same version of Linguist.import linguist from 'linguist-js';
// Analyse folder on disc
const folders = ['./src'];
const options = { keepVendored: false, quick: false };
const { files, languages, unknown, repository } = await linguist.analyseFolders(folder, options);
// Analyse file content from raw input
const fileContent = {
['file1.ts']: '#!/usr/bin/env node',
['file2.ts']: 'console.log("Example");',
['ignoreme.js']: 'ignored!',
}
const options = { ignoredFiles: ['ignoreme.*'] };
const { files, languages, unknown, repository } = await linguist.analyseRawContent(fileContent, options);
Exports:
analyseFolders(folders?, opts?):
Analyse the language of all files found in a folder or folders.
folders (optional; string array):
A list of folders to analyse (defaults to ['./']).opts (optional; object):
An object containing analyser options.analyseRawContent(folders?, opts?):
Analyse the language of all files found in a folder or folders.
entry (optional; string or string array):
A list of folders to analyse (defaults to ['./']).opts (optional; object):
An object containing analyser options.Analyser options:
ignoredFiles (string array):
A list of file path globs to explicitly ignore.ignoredLanguages (string array):
A list of languages to ignore.categories (string array):
A list of programming language categories that should be included in the results.
Defaults to ['data', 'markup', 'programming', 'prose'].childLanguages (boolean):
Whether to display sub-languages instead of their parents when possible (defaults to false).quick (boolean):
Whether to skip complex language analysis such as the checking of heuristics and gitattributes statements (defaults to false).
Alias for checkAttributes:false, checkIgnored:false, checkDetected:false, checkHeuristics:false, checkShebang:false, checkModeline:false.offline (boolean):
Whether to use pre-packaged metadata files instead of fetching them from GitHub at runtime (defaults to false).calculateLines (boolean):
Whether to calculate line of code totals (defaults to true).keepVendored (boolean):
Whether to keep vendored files (dependencies, etc) (defaults to false).
Does nothing when fileContent is set.keepBinary (boolean):
Whether binary files should be included in the output (defaults to false).relativePaths (boolean):
Change the absolute file paths in the output to be relative to the current working directory (defaults to false).checkAttributes (boolean):
Force the checking of .gitattributes files (defaults to true unless quick is set).
Does nothing when fileContent is set.checkIgnored (boolean):
Force the checking of .gitignore files (defaults to true unless quick is set).
Does nothing when fileContent is set.checkDetected (boolean):
Force files marked with linguist-detectable to show up in the output, even if the file is not part of the declared categories.checkHeuristics (boolean):
Apply heuristics to ambiguous languages (defaults to true unless quick is set).checkShebang (boolean):
Check shebang (#!) lines for explicit language classification (defaults to true unless quick is set).checkModeline (boolean):
Check modelines for explicit language classification (defaults to true unless quick is set).linguist --analyse [<folders...>] [<options...>]
linguist --help
linguist --version
--analyse:
Analyse the language of all files found in a folder or folders.
[<folders...>]:
The folders to analyse (defaults to ./).--ignoredFiles <globs...>:
A list of file path globs to ignore.--ignoredLanguages <languages...>:
A list of languages to exclude from the output.--categories <categories...>:
A list of language categories that should be displayed in the output.
Must be one or more of data, prose, programming, markup.--childLanguages:
Display sub-languages instead of their parents, when possible.--json:
Only affects the CLI output.
Display the outputted language data as JSON.--tree <traversal>:
Only affects the CLI output.
A dot-delimited traversal to the nested object that should be logged to the console instead of the entire output.
Requires --json to be specified.--listFiles:
Only affects the visual CLI output.
List each matching file and its size under each outputted language result.
Does nothing if --json is specified.--quick:
Skip the checking of .gitattributes and .gitignore files for manual language classifications.
Alias for --checkAttributes=false --checkIgnored=false --checkHeuristics=false --checkShebang=false --checkModeline=false.--offline:
Use pre-packaged metadata files instead of fetching them from GitHub at runtime.--calculateLines:
Calculate line of code totals from files.--keepVendored:
Include vendored files (auto-generated files, dependencies folder, etc) in the output.--keepBinary:
Include binary files in the output.--relativePaths:
Change the absolute file paths in the output to be relative to the current working directory.--checkAttributes:
Force the checking of .gitatributes files.
Use alongside --quick to override it disabling this option.--checkIgnored:
Force the checking of .gitignore files.
Use alongside --quick to override it disabling this option.--checkDetected:
Force files marked with linguist-detectable to show up in the output, even if the file is not part of the declared --categories.
Use alongside --quick to override it disabling this option.--checkHeuristics:
Apply heuristics to ambiguous languages.
Use alongside --quick to override it disabling this option.--checkShebang:
Check shebang (#!) lines for explicit classification.
Use alongside --quick to override it disabling this option.--checkModeline:
Check modelines for explicit classification.
Use alongside --quick to override it disabling this option.--help:
Display the help message.--version:
Display the current installed version of LinguistJS.