Module pyparsing

pyparsing module - Classes and methods to define and execute parsing grammars

The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the traditional lex/yacc approach, or the use of regular expressions. With pyparsing, you don't need to learn a new syntax for defining grammars or matching expressions - the parsing module provides a library of classes that you use to construct the grammar directly in Python.

Here is a program to parse "Hello, World!" (or any greeting of the form ``"<salutation>, <addressee>!"``), built up using :class:`Word`, :class:`Literal`, and :class:`And` elements (the :class:`'+'<ParserElement.__add__>` operators create :class:`And` expressions, and the strings are auto-converted to :class:`Literal` expressions):

   from pyparsing import Word, alphas

   # define grammar of a greeting
   greet = Word(alphas) + "," + Word(alphas) + "!"

   hello = "Hello, World!"
   print (hello, "->", greet.parseString(hello))

The program outputs the following:

   Hello, World! -> ['Hello', ',', 'World', '!']

The Python representation of the grammar is quite readable, owing to the self-explanatory class names, and the use of '+', '|' and '^' operators.

The :class:`ParseResults` object returned from :class:`ParserElement.parseString` can be accessed as a nested list, a dictionary, or an object with named attributes.

The pyparsing module handles some of the problems that are typically vexing when writing text parsers:

extra or missing whitespace (the above program will also handle "Hello,World!", "Hello , World !", etc.)
quoted strings
embedded comments

Getting Started -

Visit the classes :class:`ParserElement` and :class:`ParseResults` to see the base classes that most other pyparsing classes inherit from. Use the docstrings for examples of how to:

construct literal match expressions from :class:`Literal` and :class:`CaselessLiteral` classes
construct character word-group expressions using the :class:`Word` class
see how to create repetitive expressions using :class:`ZeroOrMore` and :class:`OneOrMore` classes
use :class:`'+'<And>`, :class:`'|'<MatchFirst>`, :class:`'^'<Or>`, and :class:`'&'<Each>` operators to combine simple expressions into more complex ones
associate names with your parsed results using :class:`ParserElement.setResultsName`
access the parsed data, which is returned as a :class:`ParseResults` object
find some helpful expression short-cuts like :class:`delimitedList` and :class:`oneOf`
find more useful common expressions in the :class:`pyparsing_common` namespace class

Version: 2.4.6

Author: Paul McGuire <ptmcg@users.sourceforge.net>

Classes

[hide private]

SimpleNamespace

basestring
str(object='') -> string

unicode
str(object='') -> string

ParseBaseException
base exception class for all parsing runtime exceptions

ParseException
Exception thrown when parse expressions don't match class; supported attributes by name are: - lineno - returns the line number of the exception text - col - returns the column number of the exception text - line - returns the line containing the exception text

ParseFatalException
user-throwable exception thrown when inconsistent parse content is found; stops all parsing immediately

ParseSyntaxException
just like :class:`ParseFatalException`, but thrown internally when an :class:`ErrorStop<And._ErrorStop>` ('-' operator) indicates that parsing is to stop immediately because an unbacktrackable syntax error has been found.

RecursiveGrammarException
exception thrown by :class:`ParserElement.validate` if the grammar could be improperly recursive

_ParseResultsWithOffset

ParseResults
Structured parse results, to provide multiple means of access to the parsed data:

ParserElement
Abstract base level parser element class.

_PendingSkip

Token
Abstract :class:`ParserElement` subclass, for defining atomic matching patterns.

Empty
An empty token, will always match.

NoMatch
A token that will never match.

Literal
Token to exactly match a specified string.

_SingleCharLiteral

_L
Token to exactly match a specified string.

Keyword
Token to exactly match a specified string as a keyword, that is, it must be immediately followed by a non-keyword character.

CaselessLiteral
Token to match a specified string, ignoring case of letters.

CaselessKeyword
Caseless version of :class:`Keyword`.

CloseMatch
A variation on :class:`Literal` which matches "close" matches, that is, strings with at most 'n' mismatching characters.

Word
Token for matching words composed of allowed character sets.

_WordRegex

Char
A short-cut class for defining ``Word(characters, exact=1)``, when defining a match of any single character in a string of characters.

Regex
Token for matching strings that match a given regular expression.

QuotedString
Token for matching strings that are delimited by quoting characters.

CharsNotIn
Token for matching words composed of characters *not* in a given set (will include whitespace in matched characters if not listed in the provided exclusion set - see example).

White
Special matching class for matching whitespace.

_PositionToken

GoToColumn
Token to advance to a specific column of input text; useful for tabular report scraping.

LineStart
Matches if current position is at the beginning of a line within the parse string

LineEnd
Matches if current position is at the end of a line within the parse string

StringStart
Matches if current position is at the beginning of the parse string

StringEnd
Matches if current position is at the end of the parse string

WordStart
Matches if the current position is at the beginning of a Word, and is not preceded by any character in a given set of ``wordChars`` (default= ``printables``).

WordEnd
Matches if the current position is at the end of a Word, and is not followed by any character in a given set of ``wordChars`` (default= ``printables``).

ParseExpression
Abstract subclass of ParserElement, for combining and post-processing parsed tokens.

And
Requires all given :class:`ParseExpression` s to be found in the given order.

Or
Requires that at least one :class:`ParseExpression` is found.

MatchFirst
Requires that at least one :class:`ParseExpression` is found.

Each
Requires all given :class:`ParseExpression` s to be found, but in any order.

ParseElementEnhance
Abstract subclass of :class:`ParserElement`, for combining and post-processing parsed tokens.

FollowedBy
Lookahead matching of the given parse expression.

PrecededBy
Lookbehind matching of the given parse expression.

NotAny
Lookahead to disallow matching with the given parse expression.

_MultipleMatch

OneOrMore
Repetition of one or more of the given expression.

ZeroOrMore
Optional repetition of zero or more of the given expression.

_NullToken

Optional
Optional matching of the given expression.

SkipTo
Token for skipping over all undefined text until the matched expression is found.

Forward
Forward declaration of an expression to be defined later - used for recursive grammars, such as algebraic infix notation.

TokenConverter
Abstract subclass of :class:`ParseExpression`, for converting parsed results.

Combine
Converter to concatenate all matching tokens to a single string.

Group
Converter to return the matched tokens as a list - useful for returning tokens of :class:`ZeroOrMore` and :class:`OneOrMore` expressions.

Dict
Converter to return a repetitive expression as a list, but also as a dictionary.

Suppress
Converter for ignoring the results of a parsed expression.

OnlyOnce
Wrapper for parse actions, to ensure they are only called once.

pyparsing_common
Here are some common low-level expressions that may be useful in jump-starting parser development:

_lazyclassproperty

unicode_set
A set of Unicode characters, for language-specific strings for ``alphas``, ``nums``, ``alphanums``, and ``printables``.

pyparsing_unicode
A namespace class for defining common language unicode_sets.

pyparsing_test
namespace class for classes useful in writing unit tests

Functions

[hide private]

_enable_all_warnings()

source code

character

unichr(i)
Return a string of one character with ordinal i; 0 <= i < 256.

_ustr(obj)
Drop-in replacement for str(obj) that tries to be Unicode friendly.

source code

_xml_escape(data)
Escape &, <, >, ", ', etc.

source code

conditionAsParseAction(fn, message=None, fatal=False)

source code

col(loc, strg)
Returns current column within a string, counting newlines as line separators.

source code

lineno(loc, strg)
Returns current line number within a string, counting newlines as line separators.

source code

line(loc, strg)
Returns the line of text containing loc within a string, counting newlines as line separators.

source code

_defaultStartDebugAction(instring, loc, expr)

source code

_defaultSuccessDebugAction(instring, startloc, endloc, expr, toks)

source code

_defaultExceptionDebugAction(instring, loc, expr, exc)

source code

nullDebugAction(*args)
'Do-nothing' debug action, to suppress debugging output during parsing.

source code

_trim_arity(func, maxargs=2)

source code

traceParseAction(f)
Decorator for debugging parse actions.

source code

delimitedList(expr, delim=',', combine=False)
Helper to define a delimited list of expressions - the delimiter defaults to ','. source code

countedArray(expr, intExpr=None)
Helper to define a counted list of expressions.

source code

_flatten(L)

source code

matchPreviousLiteral(expr)
Helper to define an expression that is indirectly defined from the tokens matched in a previous expression, that is, it looks for a 'repeat' of a previous expression.

source code

matchPreviousExpr(expr)
Helper to define an expression that is indirectly defined from the tokens matched in a previous expression, that is, it looks for a 'repeat' of a previous expression.

source code

_escapeRegexRangeChars(s)

source code

oneOf(strs, caseless=False, useRegex=True, asKeyword=False)
Helper to quickly define a set of alternative Literals, and makes sure to do longest-first testing when there is a conflict, regardless of the input order, but returns a :class:`MatchFirst` for best performance.

source code

dictOf(key, value)
Helper to easily and clearly define a dictionary by specifying the respective patterns for the key and value.

source code

originalTextFor(expr, asString=True)
Helper to return the original, untokenized text for a given expression.

source code

ungroup(expr)
Helper to undo pyparsing's default grouping of And expressions, even if all but one are non-empty.

source code

locatedExpr(expr)
Helper to decorate a returned token with its starting and ending locations in the input string.

source code

srange(s)
Helper to easily define string ranges for use in Word construction.

source code

matchOnlyAtCol(n)
Helper method for defining parse actions that require matching at a specific column in the input text.

source code

replaceWith(replStr)
Helper method for common parse actions that simply return a literal value.

source code

removeQuotes(s, l, t)
Helper parse action for removing quotation marks from parsed quoted strings.

source code

tokenMap(func, *args)
Helper to define a parse action by mapping a function to all elements of a ParseResults list.

source code

upcaseTokens(s, l, t)
(Deprecated) Helper parse action to convert tokens to upper case.

source code

downcaseTokens(s, l, t)
(Deprecated) Helper parse action to convert tokens to lower case.

source code

_makeTags(tagStr, xml, suppress_LT=Suppress:("<"), suppress_GT=Suppress:(">"))
Internal helper to construct opening and closing tag expressions, given a tag name

source code

makeHTMLTags(tagStr)
Helper to construct opening and closing tag expressions for HTML, given a tag name.

source code

makeXMLTags(tagStr)
Helper to construct opening and closing tag expressions for XML, given a tag name.

source code

withAttribute(*args, **attrDict)
Helper to create a validating parse action to be used with start tags created with :class:`makeXMLTags` or :class:`makeHTMLTags`.

source code

withClass(classname, namespace='')
Simplified version of :class:`withAttribute` when matching on a div class - made difficult because ``class`` is a reserved word in Python. source code

infixNotation(baseExpr, opList, lpar=Suppress:("("), rpar=Suppress:(")"))
Helper method for constructing grammars of expressions made up of operators working in a precedence hierarchy.

source code

operatorPrecedence(baseExpr, opList, lpar=Suppress:("("), rpar=Suppress:(")"))
(Deprecated) Former name of :class:`infixNotation`, will be dropped in a future release.

source code

nestedExpr(opener='(', closer=')', content=None, ignoreExpr=quotedString using single or double quotes)
Helper method for defining nested lists enclosed in opening and closing delimiters ("(" and ")" are the default). source code

indentedBlock(blockStatementExpr, indentStack, indent=True)
Helper method for defining space-delimited indentation blocks, such as those used to define block statements in Python source code.

source code

replaceHTMLEntity(t)
Helper parser action to replace common HTML entities with their special characters

source code

Variables

[hide private]

__doc__ = ...

__versionTime__ = '24 Dec 2019 04:27 UTC'

__compat__ = SimpleNamespace()

__diag__ = SimpleNamespace()

system_version = (2, 7, 16)

PY_3 = False

_MAX_INT = 9223372036854775807

singleArgBuiltins = [<built-in function sum>, <built-in functi...

alphas = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'

nums = '0123456789'

hexnums = '0123456789ABCDEFabcdef'

alphanums = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvw...

_bslash = '\\'

printables = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKL...

empty = empty

lineStart = lineStart

lineEnd = lineEnd

stringStart = stringStart

stringEnd = stringEnd

_escapedPunc = W:(\, \[]-...)

_escapedHexChar = Re:('\\\\0?[xX][0-9a-fA-F]+')

_escapedOctChar = Re:('\\\\0[0-7]+')

_singleChar = {W:(\, \[]-...) | Re:('\\\\0?[xX][0-9a-fA-F]+') ...

_charRange = Group:({{W:(\, \[]-...) | Re:('\\\\0?[xX][0-9a-fA...

_reBracketExpr = {"[" ["^"] Group:({{Group:({{W:(\, \[]-...) |...

opAssoc = SimpleNamespace()

dblQuotedString = string enclosed in double quotes

sglQuotedString = string enclosed in single quotes

quotedString = quotedString using single or double quotes

unicodeString = unicode string literal

alphas8bit = u'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîï...

punc8bit = u'¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿×÷'

_htmlEntityMap = {'amp': '&', 'apos': '\'', 'gt': '>', 'lt': '...

commonHTMLEntity = common HTML entity

cStyleComment = C style comment
Comment of the form ``/* ...

htmlComment = HTML comment
Comment of the form ``<!-- ...

restOfLine = rest of line

dblSlashComment = // comment
Comment of the form ``// ...

cppStyleComment = C++ style comment
Comment of either form :class:`cStyleComment` or :class:`dblSlashComment`

javaStyleComment = C++ style comment
Same as :class:`cppStyleComment`

pythonStyleComment = Python style comment
Comment of the form ``# ...

_commasepitem = commaItem

commaSeparatedList = commaSeparatedList
(Deprecated) Predefined expression of 1 or more printable words or quoted strings, separated by commas.

__package__ = None

anyCloseTag = </any tag>

anyOpenTag = <any tag>

fname = 'max'

nm = '__doc__'

Function Details

Module pyparsing

pyparsing module - Classes and methods to define and execute parsing grammars

Getting Started -

_ustr(obj)

_xml_escape(data)

col(loc, strg)

lineno(loc, strg)

traceParseAction(f)

delimitedList(expr, delim=',', combine=False)

countedArray(expr, intExpr=None)

matchPreviousLiteral(expr)

matchPreviousExpr(expr)

oneOf(strs, caseless=False, useRegex=True, asKeyword=False)

dictOf(key, value)

originalTextFor(expr, asString=True)

locatedExpr(expr)

srange(s)

replaceWith(replStr)

removeQuotes(s, l, t)

tokenMap(func, *args)

upcaseTokens(s, l, t)

downcaseTokens(s, l, t)

makeHTMLTags(tagStr)

makeXMLTags(tagStr)

withAttribute(*args, **attrDict)

withClass(classname, namespace='')

infixNotation(baseExpr, opList, lpar=Suppress:("("), rpar=Suppress:(")"))

operatorPrecedence(baseExpr, opList, lpar=Suppress:("("), rpar=Suppress:(")"))

nestedExpr(opener='(', closer=')', content=None, ignoreExpr=quotedString using single or double quotes)

indentedBlock(blockStatementExpr, indentStack, indent=True)

__doc__

singleArgBuiltins

alphanums

printables

_singleChar

_charRange

_reBracketExpr

alphas8bit

_htmlEntityMap

cStyleComment

htmlComment

dblSlashComment

pythonStyleComment

commaSeparatedList

delimitedList(expr, delim=`','`, combine=False)

withClass(classname, namespace=`''`)

nestedExpr(opener=`'('`, closer=`')'`, content=None, ignoreExpr=quotedString using single or double quotes)

doc