nearley vs langium vs antlr4 vs pegjs vs jison
Parsing Libraries for JavaScript
nearleylangiumantlr4pegjsjison
Parsing Libraries for JavaScript

Parsing libraries in JavaScript are tools that help developers analyze and process text, code, or structured data by breaking it down into smaller, manageable components. These libraries are essential for tasks such as building compilers, interpreters, linters, or any application that needs to understand and manipulate the structure of input data. They typically use techniques like lexical analysis and syntax analysis to convert raw input into a structured format, such as an Abstract Syntax Tree (AST), which can then be traversed and manipulated programmatically. Parsing libraries vary in complexity, performance, and features, catering to different use cases ranging from simple data extraction to full-fledged language processing.

Npm Package Weekly Downloads Trend
3 Years
Github Stars Ranking
Stat Detail
Package
Downloads
Stars
Size
Issues
Publish
License
nearley4,075,3833,728-1985 years agoMIT
langium1,863,1789243.89 MB11710 days agoMIT
antlr4855,80518,5503.09 MB1,043a year agoBSD-3-Clause
pegjs679,0954,915-1179 years agoMIT
jison79,6954,387-1638 years agoMIT
Feature Comparison: nearley vs langium vs antlr4 vs pegjs vs jison

Grammar Definition

  • nearley:

    nearley uses a simple and expressive syntax for defining grammars, supporting both ambiguous and unambiguous grammars. It allows for the inclusion of JavaScript code for actions, providing flexibility in how parsed data is handled.

  • langium:

    langium supports grammar definition using a DSL (Domain-Specific Language) syntax, which is designed to be intuitive for language designers. It provides rich support for defining syntax, semantics, and tooling features like auto-completion and validation.

  • antlr4:

    antlr4 uses a context-free grammar (CFG) format, allowing for complex and hierarchical grammar definitions. It supports features like actions, semantic predicates, and error handling directly within the grammar files.

  • pegjs:

    pegjs uses Parsing Expression Grammar (PEG) notation to define grammars. It is simple and expressive, allowing for clear and concise grammar definitions. PEGs are particularly good at handling ambiguities and provide a deterministic parsing approach.

  • jison:

    jison allows grammar definition using a BNF-like syntax, which is straightforward and easy to understand. It supports embedding JavaScript code for actions, making it flexible for custom parsing logic.

Parser Generation

  • nearley:

    nearley generates parsers from the defined grammars, producing efficient JavaScript code. It supports both top-down and bottom-up parsing strategies, allowing for flexibility in how the parsing is performed.

  • langium:

    langium generates parsers and accompanying tooling (like editors with syntax highlighting, validation, etc.) from the defined DSL grammar. It focuses on providing a complete ecosystem for language development.

  • antlr4:

    antlr4 generates parsers in multiple target languages (JavaScript, Java, C#, etc.) from a single grammar file. It provides a rich set of APIs for traversing and manipulating the parse tree or abstract syntax tree (AST).

  • pegjs:

    pegjs generates fast and efficient parsers from PEG grammars. The generated parsers are JavaScript functions that can be easily integrated into any application, providing a simple API for parsing input strings.

  • jison:

    jison generates JavaScript parsers from the defined grammars, producing efficient and lightweight parser code that can be easily integrated into web applications or Node.js projects.

Error Handling

  • nearley:

    nearley provides basic error handling capabilities, including the ability to report syntax errors and integrate custom error handling logic within the grammar. However, it is relatively simple compared to other libraries.

  • langium:

    langium includes built-in error handling for syntax errors, with support for customizable error messages and integration with IDEs for real-time error feedback. It is designed to provide a good developer experience when working with DSLs.

  • antlr4:

    antlr4 provides advanced error handling capabilities, including customizable error messages, recovery strategies, and the ability to define error nodes in the grammar. This makes it suitable for building robust parsers that can handle malformed input gracefully.

  • pegjs:

    pegjs offers good error handling features, including detailed error messages and the ability to define custom error handling logic within the grammar. It provides clear feedback on parsing errors, which is helpful for debugging.

  • jison:

    jison supports error handling through the definition of error rules in the grammar. Developers can specify custom error messages and recovery strategies, making it flexible for handling parsing errors.

Performance

  • nearley:

    nearley is known for its performance, especially with unambiguous grammars. It is lightweight and fast, making it suitable for applications that require quick parsing with minimal overhead.

  • langium:

    langium focuses more on providing rich tooling and features for DSL development rather than raw parsing performance. While it is efficient for typical use cases, it may not be optimized for parsing very large inputs.

  • antlr4:

    antlr4 is optimized for performance, especially when parsing large and complex inputs. It uses efficient algorithms for lexical and syntactic analysis, and the generated parsers are designed to be fast and memory-efficient.

  • pegjs:

    pegjs produces fast parsers with a small memory footprint, making it ideal for performance-sensitive applications. The use of PEGs allows for efficient parsing, and the generated parsers are highly optimized.

  • jison:

    jison generates parsers that are generally efficient, but performance can vary depending on the complexity of the grammar and the amount of embedded JavaScript code. It is suitable for most applications but may not be the best choice for extremely performance-critical scenarios.

Ease of Use: Code Examples

  • nearley:

    nearley Example

    // Define a simple grammar in Nearley syntax
    @{%
    const { Parser } = require('nearley');
    const grammar = require('./grammar.ne');
    const parser = new Parser(grammar);
    %}
    
    // Define the grammar in a separate file (grammar.ne)
    @{%
    // This is a simple grammar for parsing greetings
    # A greeting can be 'hello' followed by a name
    main -> 'hello' name
    name -> [a-zA-Z]+  // Match one or more letters
    
    // Generate the parser using Nearley
    // nearleyc grammar.ne -o grammar.js
    %}
    
    // Use the generated parser in JavaScript
    const input = 'hello Alice';
    parser.feed(input);
    const result = parser.results;
    console.log(result);
    
  • langium:

    langium Example

    // Define a simple DSL grammar in Langium syntax
    grammar Simple
    
    // Define a rule for a greeting
    Greeting:
      'hello' Name;
    
    Name:
      [a-zA-Z]+;
    
    // Generate the parser and tooling using Langium
    // langium generate
    
    // Use the generated parser in JavaScript
    const { parse } = require('./generated/parser');
    const { createLangiumServices } = require('./generated/services');
    
    const services = createLangiumServices();
    const parser = services.parser;
    const input = 'hello John';
    const result = parser.parse(input);
    console.log(result);
    
  • antlr4:

    antlr4 Example

    // Define a simple grammar in ANTLR syntax
    grammar Hello;
    
    // Parser rules
    r  : 'hello' ID;
    ID : [a-zA-Z]+;
    
    // Lexer rules
    WS : [ \t\r\n]+ -> skip;
    

    // Generate parser using ANTLR tool // antlr4 Hello.g4 -Dlanguage=JavaScript

    // Use the generated parser in JavaScript const antlr4 = require('antlr4'); const HelloLexer = require('./HelloLexer'); const HelloParser = require('./HelloParser');

    const input = 'hello world'; const chars = new antlr4.InputStream(input); const lexer = new HelloLexer.HelloLexer(chars); const tokens = new antlr4.CommonTokenStream(lexer); const parser = new HelloParser.HelloParser(tokens); const tree = parser.r(); // Parse the input console.log(tree.toStringTree(parser.ruleNames)); // Print the parse tree

  • pegjs:

    pegjs Example

    // Define a simple grammar in PEG.js syntax
    start
      = 'hello' ' ' name:name
      / 'goodbye' ' ' name:name
      / 'invalid'
    
    name
      = [a-zA-Z]+  // Match one or more letters
    
    // Generate the parser using PEG.js
    // pegjs grammar.pegjs
    
    // Use the generated parser in JavaScript
    const peg = require('./grammar');
    const result = peg.parse('hello John');
    console.log(result);
    
  • jison:

    jison Example

    // Define a simple grammar in Jison syntax
    %lex
    %%
    \s+           ;  // Ignore whitespace
    hello         return 'HELLO';
    world         return 'WORLD';
    .             return 'INVALID';
    %%
    
    %start start
    
    %%
    start
      : HELLO WORLD  { console.log('Parsed: hello world'); }
      | HELLO INVALID  { console.log('Parsed: hello with invalid'); }
      | INVALID WORLD  { console.log('Parsed: invalid with world'); }
      ;
    
    // Generate the parser using Jison
    // jison hello.jison
    
    // Use the generated parser in JavaScript
    const parser = require('./hello');
    parser.parse('hello world'); // Parsed: hello world
    parser.parse('hello invalid'); // Parsed: hello with invalid
    parser.parse('invalid world'); // Parsed: invalid with world
    
How to Choose: nearley vs langium vs antlr4 vs pegjs vs jison
  • nearley:

    Use nearley if you need a fast and flexible parser generator that supports ambiguous grammars and allows for easy integration with JavaScript applications. It is lightweight and provides a simple API for parsing, making it suitable for both small and large projects.

  • langium:

    Opt for langium if you are working on language development and need a modern framework that supports the creation of Domain-Specific Languages (DSLs) with rich IDE features. It is built on top of the Eclipse Xtext framework and provides excellent tooling support for language designers.

  • antlr4:

    Choose antlr4 if you need a powerful and feature-rich parser generator that supports multiple languages and provides advanced features like error handling, tree walking, and code generation. It is suitable for complex parsing tasks and building compilers or interpreters.

  • pegjs:

    Choose pegjs if you want a simple and efficient parser generator that uses Parsing Expression Grammars (PEG) to define grammars. It produces fast parsers with a small footprint, making it ideal for projects where performance and simplicity are key.

  • jison:

    Select jison if you prefer a JavaScript-based parser generator that allows you to define grammars using a simple BNF-like syntax. It is easy to use and integrates well with JavaScript projects, making it ideal for creating parsers quickly without a steep learning curve.

README for nearley

nearley ↗️

JS.ORG npm version

nearley is a simple, fast and powerful parsing toolkit. It consists of:

  1. A powerful, modular DSL for describing languages
  2. An efficient, lightweight Earley parser
  3. Loads of tools, editor plug-ins, and other goodies!

nearley is a streaming parser with support for catching errors gracefully and providing all parsings for ambiguous grammars. It is compatible with a variety of lexers (we recommend moo). It comes with tools for creating tests, railroad diagrams and fuzzers from your grammars, and has support for a variety of editors and platforms. It works in both node and the browser.

Unlike most other parser generators, nearley can handle any grammar you can define in BNF (and more!). In particular, while most existing JS parsers such as PEGjs and Jison choke on certain grammars (e.g. left recursive ones), nearley handles them easily and efficiently by using the Earley parsing algorithm.

nearley is used by a wide variety of projects:

nearley is an npm staff pick.

Documentation

Please visit our website https://nearley.js.org to get started! You will find a tutorial, detailed reference documents, and links to several real-world examples to get inspired.

Contributing

Please read this document before working on nearley. If you are interested in contributing but unsure where to start, take a look at the issues labeled "up for grabs" on the issue tracker, or message a maintainer (@kach or @tjvr on Github).

nearley is MIT licensed.

A big thanks to Nathan Dinsmore for teaching me how to Earley, Aria Stewart for helping structure nearley into a mature module, and Robin Windels for bootstrapping the grammar. Additionally, Jacob Edelman wrote an experimental JavaScript parser with nearley and contributed ideas for EBNF support. Joshua T. Corbin refactored the compiler to be much, much prettier. Bojidar Marinov implemented postprocessors-in-other-languages. Shachar Itzhaky fixed a subtle bug with nullables.

Citing nearley

If you are citing nearley in academic work, please use the following BibTeX entry.

@misc{nearley,
    author = "Kartik Chandra and Tim Radvan",
    title  = "{nearley}: a parsing toolkit for {JavaScript}",
    year   = {2014},
    doi    = {10.5281/zenodo.3897993},
    url    = {https://github.com/kach/nearley}
}