Module pyparsing
[hide private]
[frames] | no frames]

Module pyparsing

source code

pyparsing module - Classes and methods to define and execute parsing grammars

The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the traditional lex/yacc approach, or the use of regular expressions. With pyparsing, you don't need to learn a new syntax for defining grammars or matching expressions - the parsing module provides a library of classes that you use to construct the grammar directly in Python.

Here is a program to parse "Hello, World!" (or any greeting of the form ``"<salutation>, <addressee>!"``), built up using :class:`Word`, :class:`Literal`, and :class:`And` elements (the :class:`'+'<ParserElement.__add__>` operators create :class:`And` expressions, and the strings are auto-converted to :class:`Literal` expressions):

   from pyparsing import Word, alphas

   # define grammar of a greeting
   greet = Word(alphas) + "," + Word(alphas) + "!"

   hello = "Hello, World!"
   print (hello, "->", greet.parseString(hello))

The program outputs the following:

   Hello, World! -> ['Hello', ',', 'World', '!']

The Python representation of the grammar is quite readable, owing to the self-explanatory class names, and the use of '+', '|' and '^' operators.

The :class:`ParseResults` object returned from :class:`ParserElement.parseString` can be accessed as a nested list, a dictionary, or an object with named attributes.

The pyparsing module handles some of the problems that are typically vexing when writing text parsers:

Getting Started -

Visit the classes :class:`ParserElement` and :class:`ParseResults` to see the base classes that most other pyparsing classes inherit from. Use the docstrings for examples of how to:


Version: 2.4.6

Author: Paul McGuire <ptmcg@users.sourceforge.net>

Classes [hide private]
  SimpleNamespace
  basestring
str(object='') -> string
  unicode
str(object='') -> string
  ParseBaseException
base exception class for all parsing runtime exceptions
  ParseException
Exception thrown when parse expressions don't match class; supported attributes by name are: - lineno - returns the line number of the exception text - col - returns the column number of the exception text - line - returns the line containing the exception text
  ParseFatalException
user-throwable exception thrown when inconsistent parse content is found; stops all parsing immediately
  ParseSyntaxException
just like :class:`ParseFatalException`, but thrown internally when an :class:`ErrorStop<And._ErrorStop>` ('-' operator) indicates that parsing is to stop immediately because an unbacktrackable syntax error has been found.
  RecursiveGrammarException
exception thrown by :class:`ParserElement.validate` if the grammar could be improperly recursive
  _ParseResultsWithOffset
  ParseResults
Structured parse results, to provide multiple means of access to the parsed data:
  ParserElement
Abstract base level parser element class.
  _PendingSkip
  Token
Abstract :class:`ParserElement` subclass, for defining atomic matching patterns.
  Empty
An empty token, will always match.
  NoMatch
A token that will never match.
  Literal
Token to exactly match a specified string.
  _SingleCharLiteral
  _L
Token to exactly match a specified string.
  Keyword
Token to exactly match a specified string as a keyword, that is, it must be immediately followed by a non-keyword character.
  CaselessLiteral
Token to match a specified string, ignoring case of letters.
  CaselessKeyword
Caseless version of :class:`Keyword`.
  CloseMatch
A variation on :class:`Literal` which matches "close" matches, that is, strings with at most 'n' mismatching characters.
  Word
Token for matching words composed of allowed character sets.
  _WordRegex
  Char
A short-cut class for defining ``Word(characters, exact=1)``, when defining a match of any single character in a string of characters.
  Regex
Token for matching strings that match a given regular expression.
  QuotedString
Token for matching strings that are delimited by quoting characters.
  CharsNotIn
Token for matching words composed of characters *not* in a given set (will include whitespace in matched characters if not listed in the provided exclusion set - see example).
  White
Special matching class for matching whitespace.
  _PositionToken
  GoToColumn
Token to advance to a specific column of input text; useful for tabular report scraping.
  LineStart
Matches if current position is at the beginning of a line within the parse string
  LineEnd
Matches if current position is at the end of a line within the parse string
  StringStart
Matches if current position is at the beginning of the parse string
  StringEnd
Matches if current position is at the end of the parse string
  WordStart
Matches if the current position is at the beginning of a Word, and is not preceded by any character in a given set of ``wordChars`` (default= ``printables``).
  WordEnd
Matches if the current position is at the end of a Word, and is not followed by any character in a given set of ``wordChars`` (default= ``printables``).
  ParseExpression
Abstract subclass of ParserElement, for combining and post-processing parsed tokens.
  And
Requires all given :class:`ParseExpression` s to be found in the given order.
  Or
Requires that at least one :class:`ParseExpression` is found.
  MatchFirst
Requires that at least one :class:`ParseExpression` is found.
  Each
Requires all given :class:`ParseExpression` s to be found, but in any order.
  ParseElementEnhance
Abstract subclass of :class:`ParserElement`, for combining and post-processing parsed tokens.
  FollowedBy
Lookahead matching of the given parse expression.
  PrecededBy
Lookbehind matching of the given parse expression.
  NotAny
Lookahead to disallow matching with the given parse expression.
  _MultipleMatch
  OneOrMore
Repetition of one or more of the given expression.
  ZeroOrMore
Optional repetition of zero or more of the given expression.
  _NullToken
  Optional
Optional matching of the given expression.
  SkipTo
Token for skipping over all undefined text until the matched expression is found.
  Forward
Forward declaration of an expression to be defined later - used for recursive grammars, such as algebraic infix notation.
  TokenConverter
Abstract subclass of :class:`ParseExpression`, for converting parsed results.
  Combine
Converter to concatenate all matching tokens to a single string.
  Group
Converter to return the matched tokens as a list - useful for returning tokens of :class:`ZeroOrMore` and :class:`OneOrMore` expressions.
  Dict
Converter to return a repetitive expression as a list, but also as a dictionary.
  Suppress
Converter for ignoring the results of a parsed expression.
  OnlyOnce
Wrapper for parse actions, to ensure they are only called once.
  pyparsing_common
Here are some common low-level expressions that may be useful in jump-starting parser development:
  _lazyclassproperty
  unicode_set
A set of Unicode characters, for language-specific strings for ``alphas``, ``nums``, ``alphanums``, and ``printables``.
  pyparsing_unicode
A namespace class for defining common language unicode_sets.
  pyparsing_test
namespace class for classes useful in writing unit tests
Functions [hide private]
 
_enable_all_warnings() source code
character
unichr(i)
Return a string of one character with ordinal i; 0 <= i < 256.
 
_ustr(obj)
Drop-in replacement for str(obj) that tries to be Unicode friendly.
source code
 
_xml_escape(data)
Escape &, <, >, ", ', etc.
source code
 
conditionAsParseAction(fn, message=None, fatal=False) source code
 
col(loc, strg)
Returns current column within a string, counting newlines as line separators.
source code
 
lineno(loc, strg)
Returns current line number within a string, counting newlines as line separators.
source code
 
line(loc, strg)
Returns the line of text containing loc within a string, counting newlines as line separators.
source code
 
_defaultStartDebugAction(instring, loc, expr) source code
 
_defaultSuccessDebugAction(instring, startloc, endloc, expr, toks) source code
 
_defaultExceptionDebugAction(instring, loc, expr, exc) source code
 
nullDebugAction(*args)
'Do-nothing' debug action, to suppress debugging output during parsing.
source code
 
_trim_arity(func, maxargs=2) source code
 
traceParseAction(f)
Decorator for debugging parse actions.
source code
 
delimitedList(expr, delim=',', combine=False)
Helper to define a delimited list of expressions - the delimiter defaults to ','.
source code
 
countedArray(expr, intExpr=None)
Helper to define a counted list of expressions.
source code
 
_flatten(L) source code
 
matchPreviousLiteral(expr)
Helper to define an expression that is indirectly defined from the tokens matched in a previous expression, that is, it looks for a 'repeat' of a previous expression.
source code
 
matchPreviousExpr(expr)
Helper to define an expression that is indirectly defined from the tokens matched in a previous expression, that is, it looks for a 'repeat' of a previous expression.
source code
 
_escapeRegexRangeChars(s) source code
 
oneOf(strs, caseless=False, useRegex=True, asKeyword=False)
Helper to quickly define a set of alternative Literals, and makes sure to do longest-first testing when there is a conflict, regardless of the input order, but returns a :class:`MatchFirst` for best performance.
source code
 
dictOf(key, value)
Helper to easily and clearly define a dictionary by specifying the respective patterns for the key and value.
source code
 
originalTextFor(expr, asString=True)
Helper to return the original, untokenized text for a given expression.
source code
 
ungroup(expr)
Helper to undo pyparsing's default grouping of And expressions, even if all but one are non-empty.
source code
 
locatedExpr(expr)
Helper to decorate a returned token with its starting and ending locations in the input string.
source code
 
srange(s)
Helper to easily define string ranges for use in Word construction.
source code
 
matchOnlyAtCol(n)
Helper method for defining parse actions that require matching at a specific column in the input text.
source code
 
replaceWith(replStr)
Helper method for common parse actions that simply return a literal value.
source code
 
removeQuotes(s, l, t)
Helper parse action for removing quotation marks from parsed quoted strings.
source code
 
tokenMap(func, *args)
Helper to define a parse action by mapping a function to all elements of a ParseResults list.
source code
 
upcaseTokens(s, l, t)
(Deprecated) Helper parse action to convert tokens to upper case.
source code
 
downcaseTokens(s, l, t)
(Deprecated) Helper parse action to convert tokens to lower case.
source code
 
_makeTags(tagStr, xml, suppress_LT=Suppress:("<"), suppress_GT=Suppress:(">"))
Internal helper to construct opening and closing tag expressions, given a tag name
source code
 
makeHTMLTags(tagStr)
Helper to construct opening and closing tag expressions for HTML, given a tag name.
source code
 
makeXMLTags(tagStr)
Helper to construct opening and closing tag expressions for XML, given a tag name.
source code
 
withAttribute(*args, **attrDict)
Helper to create a validating parse action to be used with start tags created with :class:`makeXMLTags` or :class:`makeHTMLTags`.
source code
 
withClass(classname, namespace='')
Simplified version of :class:`withAttribute` when matching on a div class - made difficult because ``class`` is a reserved word in Python.
source code
 
infixNotation(baseExpr, opList, lpar=Suppress:("("), rpar=Suppress:(")"))
Helper method for constructing grammars of expressions made up of operators working in a precedence hierarchy.
source code
 
operatorPrecedence(baseExpr, opList, lpar=Suppress:("("), rpar=Suppress:(")"))
(Deprecated) Former name of :class:`infixNotation`, will be dropped in a future release.
source code
 
nestedExpr(opener='(', closer=')', content=None, ignoreExpr=quotedString using single or double quotes)
Helper method for defining nested lists enclosed in opening and closing delimiters ("(" and ")" are the default).
source code
 
indentedBlock(blockStatementExpr, indentStack, indent=True)
Helper method for defining space-delimited indentation blocks, such as those used to define block statements in Python source code.
source code
 
replaceHTMLEntity(t)
Helper parser action to replace common HTML entities with their special characters
source code
Variables [hide private]
  __doc__ = ...
  __versionTime__ = '24 Dec 2019 04:27 UTC'
  __compat__ = SimpleNamespace()
  __diag__ = SimpleNamespace()
  system_version = (2, 7, 16)
  PY_3 = False
  _MAX_INT = 9223372036854775807
  singleArgBuiltins = [<built-in function sum>, <built-in functi...
  alphas = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
  nums = '0123456789'
  hexnums = '0123456789ABCDEFabcdef'
  alphanums = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvw...
  _bslash = '\\'
  printables = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKL...
  empty = empty
  lineStart = lineStart
  lineEnd = lineEnd
  stringStart = stringStart
  stringEnd = stringEnd
  _escapedPunc = W:(\, \[]-...)
  _escapedHexChar = Re:('\\\\0?[xX][0-9a-fA-F]+')
  _escapedOctChar = Re:('\\\\0[0-7]+')
  _singleChar = {W:(\, \[]-...) | Re:('\\\\0?[xX][0-9a-fA-F]+') ...
  _charRange = Group:({{W:(\, \[]-...) | Re:('\\\\0?[xX][0-9a-fA...
  _reBracketExpr = {"[" ["^"] Group:({{Group:({{W:(\, \[]-...) |...
  opAssoc = SimpleNamespace()
  dblQuotedString = string enclosed in double quotes
  sglQuotedString = string enclosed in single quotes
  quotedString = quotedString using single or double quotes
  unicodeString = unicode string literal
  alphas8bit = u'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîï...
  punc8bit = u'¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿×÷'
  _htmlEntityMap = {'amp': '&', 'apos': '\'', 'gt': '>', 'lt': '...
  commonHTMLEntity = common HTML entity
  cStyleComment = C style comment
Comment of the form ``/* ...
  htmlComment = HTML comment
Comment of the form ``<!-- ...
  restOfLine = rest of line
  dblSlashComment = // comment
Comment of the form ``// ...
  cppStyleComment = C++ style comment
Comment of either form :class:`cStyleComment` or :class:`dblSlashComment`
  javaStyleComment = C++ style comment
Same as :class:`cppStyleComment`
  pythonStyleComment = Python style comment
Comment of the form ``# ...
  _commasepitem = commaItem
  commaSeparatedList = commaSeparatedList
(Deprecated) Predefined expression of 1 or more printable words or quoted strings, separated by commas.
  __package__ = None
  anyCloseTag = </any tag>
  anyOpenTag = <any tag>
  fname = 'max'
  nm = '__doc__'
Function Details [hide private]

_ustr(obj)

source code 

Drop-in replacement for str(obj) that tries to be Unicode friendly. It first tries str(obj). If that fails with a UnicodeEncodeError, then it tries unicode(obj). It then < returns the unicode object | encodes it with the default encoding | ... >.

_xml_escape(data)

source code 

Escape &, <, >, ", ', etc. in a string of data.

col(loc, strg)

source code 

Returns current column within a string, counting newlines as line separators. The first column is number 1.

Note: the default parsing behavior is to expand tabs in the input string before starting the parsing process. See :class:`ParserElement.parseString` for more information on parsing strings containing ``<TAB>`` s, and suggested methods to maintain a consistent view of the parsed string, the parse location, and line and column positions within the parsed string.

lineno(loc, strg)

source code 

Returns current line number within a string, counting newlines as line separators. The first line is number 1.

Note - the default parsing behavior is to expand tabs in the input string before starting the parsing process. See :class:`ParserElement.parseString` for more information on parsing strings containing ``<TAB>`` s, and suggested methods to maintain a consistent view of the parsed string, the parse location, and line and column positions within the parsed string.

traceParseAction(f)

source code 

Decorator for debugging parse actions.

When the parse action is called, this decorator will print ``">> entering method-name(line:<current_source_line>, <parse_location>, <matched_tokens>)"``. When the parse action completes, the decorator will print ``"<<"`` followed by the returned value, or any exception that the parse action raised.

Example:

   wd = Word(alphas)

   @traceParseAction
   def remove_duplicate_chars(tokens):
       return ''.join(sorted(set(''.join(tokens))))

   wds = OneOrMore(wd).setParseAction(remove_duplicate_chars)
   print(wds.parseString("slkdjs sld sldd sdlf sdljf"))

prints:

   >>entering remove_duplicate_chars(line: 'slkdjs sld sldd sdlf sdljf', 0, (['slkdjs', 'sld', 'sldd', 'sdlf', 'sdljf'], {}))
   <<leaving remove_duplicate_chars (ret: 'dfjkls')
   ['dfjkls']

delimitedList(expr, delim=',', combine=False)

source code 

Helper to define a delimited list of expressions - the delimiter defaults to ','. By default, the list elements and delimiters can have intervening whitespace, and comments, but this can be overridden by passing ``combine=True`` in the constructor. If ``combine`` is set to ``True``, the matching tokens are returned as a single token string, with the delimiters included; otherwise, the matching tokens are returned as a list of tokens, with the delimiters suppressed.

Example:

   delimitedList(Word(alphas)).parseString("aa,bb,cc") # -> ['aa', 'bb', 'cc']
   delimitedList(Word(hexnums), delim=':', combine=True).parseString("AA:BB:CC:DD:EE") # -> ['AA:BB:CC:DD:EE']

countedArray(expr, intExpr=None)

source code 

Helper to define a counted list of expressions.

This helper defines a pattern of the form:

   integer expr expr expr...

where the leading integer tells how many expr expressions follow. The matched tokens returns the array of expr tokens as a list - the leading count token is suppressed.

If ``intExpr`` is specified, it should be a pyparsing expression that produces an integer value.

Example:

   countedArray(Word(alphas)).parseString('2 ab cd ef')  # -> ['ab', 'cd']

   # in this parser, the leading integer value is given in binary,
   # '10' indicating that 2 values are in the array
   binaryConstant = Word('01').setParseAction(lambda t: int(t[0], 2))
   countedArray(Word(alphas), intExpr=binaryConstant).parseString('10 ab cd ef')  # -> ['ab', 'cd']

matchPreviousLiteral(expr)

source code 

Helper to define an expression that is indirectly defined from the tokens matched in a previous expression, that is, it looks for a 'repeat' of a previous expression. For example:

   first = Word(nums)
   second = matchPreviousLiteral(first)
   matchExpr = first + ":" + second

will match ``"1:1"``, but not ``"1:2"``. Because this matches a previous literal, will also match the leading ``"1:1"`` in ``"1:10"``. If this is not desired, use :class:`matchPreviousExpr`. Do *not* use with packrat parsing enabled.

matchPreviousExpr(expr)

source code 

Helper to define an expression that is indirectly defined from the tokens matched in a previous expression, that is, it looks for a 'repeat' of a previous expression. For example:

   first = Word(nums)
   second = matchPreviousExpr(first)
   matchExpr = first + ":" + second

will match ``"1:1"``, but not ``"1:2"``. Because this matches by expressions, will *not* match the leading ``"1:1"`` in ``"1:10"``; the expressions are evaluated first, and then compared, so ``"1"`` is compared with ``"10"``. Do *not* use with packrat parsing enabled.

oneOf(strs, caseless=False, useRegex=True, asKeyword=False)

source code 

Helper to quickly define a set of alternative Literals, and makes sure to do longest-first testing when there is a conflict, regardless of the input order, but returns a :class:`MatchFirst` for best performance.

Parameters:

  • strs - a string of space-delimited literals, or a collection of string literals
  • caseless - (default= ``False``) - treat all literals as caseless
  • useRegex - (default= ``True``) - as an optimization, will generate a Regex object; otherwise, will generate a :class:`MatchFirst` object (if ``caseless=True`` or ``asKeyword=True``, or if creating a :class:`Regex` raises an exception)
  • asKeyword - (default=``False``) - enforce Keyword-style matching on the generated expressions

Example:

   comp_oper = oneOf("< = > <= >= !=")
   var = Word(alphas)
   number = Word(nums)
   term = var | number
   comparison_expr = term + comp_oper + term
   print(comparison_expr.searchString("B = 12  AA=23 B<=AA AA>12"))

prints:

   [['B', '=', '12'], ['AA', '=', '23'], ['B', '<=', 'AA'], ['AA', '>', '12']]

dictOf(key, value)

source code 

Helper to easily and clearly define a dictionary by specifying the respective patterns for the key and value. Takes care of defining the :class:`Dict`, :class:`ZeroOrMore`, and :class:`Group` tokens in the proper order. The key pattern can include delimiting markers or punctuation, as long as they are suppressed, thereby leaving the significant key text. The value pattern can include named results, so that the :class:`Dict` results can include named token fields.

Example:

   text = "shape: SQUARE posn: upper left color: light blue texture: burlap"
   attr_expr = (label + Suppress(':') + OneOrMore(data_word, stopOn=label).setParseAction(' '.join))
   print(OneOrMore(attr_expr).parseString(text).dump())

   attr_label = label
   attr_value = Suppress(':') + OneOrMore(data_word, stopOn=label).setParseAction(' '.join)

   # similar to Dict, but simpler call format
   result = dictOf(attr_label, attr_value).parseString(text)
   print(result.dump())
   print(result['shape'])
   print(result.shape)  # object attribute access works too
   print(result.asDict())

prints:

   [['shape', 'SQUARE'], ['posn', 'upper left'], ['color', 'light blue'], ['texture', 'burlap']]
   - color: light blue
   - posn: upper left
   - shape: SQUARE
   - texture: burlap
   SQUARE
   SQUARE
   {'color': 'light blue', 'shape': 'SQUARE', 'posn': 'upper left', 'texture': 'burlap'}

originalTextFor(expr, asString=True)

source code 

Helper to return the original, untokenized text for a given expression. Useful to restore the parsed fields of an HTML start tag into the raw tag text itself, or to revert separate tokens with intervening whitespace back to the original matching input text. By default, returns astring containing the original parsed text.

If the optional ``asString`` argument is passed as ``False``, then the return value is a :class:`ParseResults` containing any results names that were originally matched, and a single token containing the original matched text from the input string. So if the expression passed to :class:`originalTextFor` contains expressions with defined results names, you must set ``asString`` to ``False`` if you want to preserve those results name values.

Example:

   src = "this is test <b> bold <i>text</i> </b> normal text "
   for tag in ("b", "i"):
       opener, closer = makeHTMLTags(tag)
       patt = originalTextFor(opener + SkipTo(closer) + closer)
       print(patt.searchString(src)[0])

prints:

   ['<b> bold <i>text</i> </b>']
   ['<i>text</i>']

locatedExpr(expr)

source code 

Helper to decorate a returned token with its starting and ending locations in the input string.

This helper adds the following results names:

  • locn_start = location where matched expression begins
  • locn_end = location where matched expression ends
  • value = the actual parsed results

Be careful if the input text contains ``<TAB>`` characters, you may want to call :class:`ParserElement.parseWithTabs`

Example:

   wd = Word(alphas)
   for match in locatedExpr(wd).searchString("ljsdf123lksdjjf123lkkjj1222"):
       print(match)

prints:

   [[0, 'ljsdf', 5]]
   [[8, 'lksdjjf', 15]]
   [[18, 'lkkjj', 23]]

srange(s)

source code 

Helper to easily define string ranges for use in Word construction. Borrows syntax from regexp '[]' string range definitions:

   srange("[0-9]")   -> "0123456789"
   srange("[a-z]")   -> "abcdefghijklmnopqrstuvwxyz"
   srange("[a-z$_]") -> "abcdefghijklmnopqrstuvwxyz$_"

The input string must be enclosed in []'s, and the returned string is the expanded character set joined into a single string. The values enclosed in the []'s may be:

  • a single character
  • an escaped character with a leading backslash (such as ``\-`` or ``\]``)
  • an escaped hex character with a leading ``'\x'`` (``\x21``, which is a ``'!'`` character) (``\0x##`` is also supported for backwards compatibility)
  • an escaped octal character with a leading ``'\0'`` (``\041``, which is a ``'!'`` character)
  • a range of any of the above, separated by a dash (``'a-z'``, etc.)
  • any combination of the above (``'aeiouy'``, ``'a-zA-Z0-9_$'``, etc.)

replaceWith(replStr)

source code 

Helper method for common parse actions that simply return a literal value. Especially useful when used with :class:`transformString<ParserElement.transformString>` ().

Example:

   num = Word(nums).setParseAction(lambda toks: int(toks[0]))
   na = oneOf("N/A NA").setParseAction(replaceWith(math.nan))
   term = na | num

   OneOrMore(term).parseString("324 234 N/A 234") # -> [324, 234, nan, 234]

removeQuotes(s, l, t)

source code 

Helper parse action for removing quotation marks from parsed quoted strings.

Example:

   # by default, quotation marks are included in parsed results
   quotedString.parseString("'Now is the Winter of our Discontent'") # -> ["'Now is the Winter of our Discontent'"]

   # use removeQuotes to strip quotation marks from parsed results
   quotedString.setParseAction(removeQuotes)
   quotedString.parseString("'Now is the Winter of our Discontent'") # -> ["Now is the Winter of our Discontent"]

tokenMap(func, *args)

source code 

Helper to define a parse action by mapping a function to all elements of a ParseResults list. If any additional args are passed, they are forwarded to the given function as additional arguments after the token, as in ``hex_integer = Word(hexnums).setParseAction(tokenMap(int, 16))``, which will convert the parsed data to an integer using base 16.

Example (compare the last to example in :class:`ParserElement.transformString`:

   hex_ints = OneOrMore(Word(hexnums)).setParseAction(tokenMap(int, 16))
   hex_ints.runTests('''
       00 11 22 aa FF 0a 0d 1a
       ''')

   upperword = Word(alphas).setParseAction(tokenMap(str.upper))
   OneOrMore(upperword).runTests('''
       my kingdom for a horse
       ''')

   wd = Word(alphas).setParseAction(tokenMap(str.title))
   OneOrMore(wd).setParseAction(' '.join).runTests('''
       now is the winter of our discontent made glorious summer by this sun of york
       ''')

prints:

   00 11 22 aa FF 0a 0d 1a
   [0, 17, 34, 170, 255, 10, 13, 26]

   my kingdom for a horse
   ['MY', 'KINGDOM', 'FOR', 'A', 'HORSE']

   now is the winter of our discontent made glorious summer by this sun of york
   ['Now Is The Winter Of Our Discontent Made Glorious Summer By This Sun Of York']

upcaseTokens(s, l, t)

source code 

(Deprecated) Helper parse action to convert tokens to upper case. Deprecated in favor of :class:`pyparsing_common.upcaseTokens`

downcaseTokens(s, l, t)

source code 

(Deprecated) Helper parse action to convert tokens to lower case. Deprecated in favor of :class:`pyparsing_common.downcaseTokens`

makeHTMLTags(tagStr)

source code 

Helper to construct opening and closing tag expressions for HTML, given a tag name. Matches tags in either upper or lower case, attributes with namespaces and with quoted or unquoted values.

Example:

   text = '<td>More info at the <a href="https://github.com/pyparsing/pyparsing/wiki">pyparsing</a> wiki page</td>'
   # makeHTMLTags returns pyparsing expressions for the opening and
   # closing tags as a 2-tuple
   a, a_end = makeHTMLTags("A")
   link_expr = a + SkipTo(a_end)("link_text") + a_end

   for link in link_expr.searchString(text):
       # attributes in the <A> tag (like "href" shown here) are
       # also accessible as named results
       print(link.link_text, '->', link.href)

prints:

   pyparsing -> https://github.com/pyparsing/pyparsing/wiki

makeXMLTags(tagStr)

source code 

Helper to construct opening and closing tag expressions for XML, given a tag name. Matches tags only in the given upper/lower case.

Example: similar to :class:`makeHTMLTags`

withAttribute(*args, **attrDict)

source code 

Helper to create a validating parse action to be used with start tags created with :class:`makeXMLTags` or :class:`makeHTMLTags`. Use ``withAttribute`` to qualify a starting tag with a required attribute value, to avoid false matches on common tags such as ``<TD>`` or ``<DIV>``.

Call ``withAttribute`` with a series of attribute names and values. Specify the list of filter attributes names and values as:

  • keyword arguments, as in ``(align="right")``, or
  • as an explicit dict with ``**`` operator, when an attribute name is also a Python reserved word, as in ``**{"class":"Customer", "align":"right"}``
  • a list of name-value tuples, as in ``(("ns1:class", "Customer"), ("ns2:align", "right"))``

For attribute names with a namespace prefix, you must use the second form. Attribute names are matched insensitive to upper/lower case.

If just testing for ``class`` (with or without a namespace), use :class:`withClass`.

To verify that the attribute exists, but without specifying a value, pass ``withAttribute.ANY_VALUE`` as the value.

Example:

   html = '''
       <div>
       Some text
       <div type="grid">1 4 0 1 0</div>
       <div type="graph">1,3 2,3 1,1</div>
       <div>this has no type</div>
       </div>

   '''
   div,div_end = makeHTMLTags("div")

   # only match div tag having a type attribute with value "grid"
   div_grid = div().setParseAction(withAttribute(type="grid"))
   grid_expr = div_grid + SkipTo(div | div_end)("body")
   for grid_header in grid_expr.searchString(html):
       print(grid_header.body)

   # construct a match with any div tag having a type attribute, regardless of the value
   div_any_type = div().setParseAction(withAttribute(type=withAttribute.ANY_VALUE))
   div_expr = div_any_type + SkipTo(div | div_end)("body")
   for div_header in div_expr.searchString(html):
       print(div_header.body)

prints:

   1 4 0 1 0

   1 4 0 1 0
   1,3 2,3 1,1

withClass(classname, namespace='')

source code 

Simplified version of :class:`withAttribute` when matching on a div class - made difficult because ``class`` is a reserved word in Python.

Example:

   html = '''
       <div>
       Some text
       <div class="grid">1 4 0 1 0</div>
       <div class="graph">1,3 2,3 1,1</div>
       <div>this &lt;div&gt; has no class</div>
       </div>

   '''
   div,div_end = makeHTMLTags("div")
   div_grid = div().setParseAction(withClass("grid"))

   grid_expr = div_grid + SkipTo(div | div_end)("body")
   for grid_header in grid_expr.searchString(html):
       print(grid_header.body)

   div_any_type = div().setParseAction(withClass(withAttribute.ANY_VALUE))
   div_expr = div_any_type + SkipTo(div | div_end)("body")
   for div_header in div_expr.searchString(html):
       print(div_header.body)

prints:

   1 4 0 1 0

   1 4 0 1 0
   1,3 2,3 1,1

infixNotation(baseExpr, opList, lpar=Suppress:("("), rpar=Suppress:(")"))

source code 
Helper method for constructing grammars of expressions made up of
operators working in a precedence hierarchy.  Operators may be unary
or binary, left- or right-associative.  Parse actions can also be
attached to operator expressions. The generated parser will also
recognize the use of parentheses to override operator precedences
(see example below).

Note: if you define a deep operator list, you may see performance
issues when using infixNotation. See
:class:`ParserElement.enablePackrat` for a mechanism to potentially
improve your parser performance.

Parameters:
 - baseExpr - expression representing the most basic element for the
   nested
 - opList - list of tuples, one for each operator precedence level
   in the expression grammar; each tuple is of the form ``(opExpr,
   numTerms, rightLeftAssoc, parseAction)``, where:

   - opExpr is the pyparsing expression for the operator; may also
     be a string, which will be converted to a Literal; if numTerms
     is 3, opExpr is a tuple of two expressions, for the two
     operators separating the 3 terms
   - numTerms is the number of terms for this operator (must be 1,
     2, or 3)
   - rightLeftAssoc is the indicator whether the operator is right
     or left associative, using the pyparsing-defined constants
     ``opAssoc.RIGHT`` and ``opAssoc.LEFT``.
   - parseAction is the parse action to be associated with
     expressions matching this operator expression (the parse action
     tuple member may be omitted); if the parse action is passed
     a tuple or list of functions, this is equivalent to calling
     ``setParseAction(*fn)``
     (:class:`ParserElement.setParseAction`)
 - lpar - expression for matching left-parentheses
   (default= ``Suppress('(')``)
 - rpar - expression for matching right-parentheses
   (default= ``Suppress(')')``)

Example::

    # simple example of four-function arithmetic with ints and
    # variable names
    integer = pyparsing_common.signed_integer
    varname = pyparsing_common.identifier

    arith_expr = infixNotation(integer | varname,
        [
        ('-', 1, opAssoc.RIGHT),
        (oneOf('* /'), 2, opAssoc.LEFT),
        (oneOf('+ -'), 2, opAssoc.LEFT),
        ])

    arith_expr.runTests('''
        5+3*6
        (5+3)*6
        -2--11
        ''', fullDump=False)

prints::

    5+3*6
    [[5, '+', [3, '*', 6]]]

    (5+3)*6
    [[[5, '+', 3], '*', 6]]

    -2--11
    [[['-', 2], '-', ['-', 11]]]

operatorPrecedence(baseExpr, opList, lpar=Suppress:("("), rpar=Suppress:(")"))

source code 

(Deprecated) Former name of :class:`infixNotation`, will be dropped in a future release.

nestedExpr(opener='(', closer=')', content=None, ignoreExpr=quotedString using single or double quotes)

source code 

Helper method for defining nested lists enclosed in opening and closing delimiters ("(" and ")" are the default).

Parameters:

  • opener - opening character for a nested list (default= ``"("``); can also be a pyparsing expression
  • closer - closing character for a nested list (default= ``")"``); can also be a pyparsing expression
  • content - expression for items within the nested lists (default= ``None``)
  • ignoreExpr - expression for ignoring opening and closing delimiters (default= :class:`quotedString`)

If an expression is not provided for the content argument, the nested expression will capture all whitespace-delimited content between delimiters as a list of separate values.

Use the ``ignoreExpr`` argument to define expressions that may contain opening or closing characters that should not be treated as opening or closing characters for nesting, such as quotedString or a comment expression. Specify multiple expressions using an :class:`Or` or :class:`MatchFirst`. The default is :class:`quotedString`, but if no expressions are to be ignored, then pass ``None`` for this argument.

Example:

   data_type = oneOf("void int short long char float double")
   decl_data_type = Combine(data_type + Optional(Word('*')))
   ident = Word(alphas+'_', alphanums+'_')
   number = pyparsing_common.number
   arg = Group(decl_data_type + ident)
   LPAR, RPAR = map(Suppress, "()")

   code_body = nestedExpr('{', '}', ignoreExpr=(quotedString | cStyleComment))

   c_function = (decl_data_type("type")
                 + ident("name")
                 + LPAR + Optional(delimitedList(arg), [])("args") + RPAR
                 + code_body("body"))
   c_function.ignore(cStyleComment)

   source_code = '''
       int is_odd(int x) {
           return (x%2);
       }

       int dec_to_hex(char hchar) {
           if (hchar >= '0' && hchar <= '9') {
               return (ord(hchar)-ord('0'));
           } else {
               return (10+ord(hchar)-ord('A'));
           }
       }
   '''
   for func in c_function.searchString(source_code):
       print("%(name)s (%(type)s) args: %(args)s" % func)

prints:

   is_odd (int) args: [['int', 'x']]
   dec_to_hex (int) args: [['char', 'hchar']]

indentedBlock(blockStatementExpr, indentStack, indent=True)

source code 

Helper method for defining space-delimited indentation blocks, such as those used to define block statements in Python source code.

Parameters:

  • blockStatementExpr - expression defining syntax of statement that is repeated within the indented block
  • indentStack - list created by caller to manage indentation stack (multiple statementWithIndentedBlock expressions within a single grammar should share a common indentStack)
  • indent - boolean indicating whether block must be indented beyond the current level; set to False for block of left-most statements (default= ``True``)

A valid block must contain at least one ``blockStatement``.

Example:

   data = '''
   def A(z):
     A1
     B = 100
     G = A2
     A2
     A3
   B
   def BB(a,b,c):
     BB1
     def BBA():
       bba1
       bba2
       bba3
   C
   D
   def spam(x,y):
        def eggs(z):
            pass
   '''


   indentStack = [1]
   stmt = Forward()

   identifier = Word(alphas, alphanums)
   funcDecl = ("def" + identifier + Group("(" + Optional(delimitedList(identifier)) + ")") + ":")
   func_body = indentedBlock(stmt, indentStack)
   funcDef = Group(funcDecl + func_body)

   rvalue = Forward()
   funcCall = Group(identifier + "(" + Optional(delimitedList(rvalue)) + ")")
   rvalue << (funcCall | identifier | Word(nums))
   assignment = Group(identifier + "=" + rvalue)
   stmt << (funcDef | assignment | identifier)

   module_body = OneOrMore(stmt)

   parseTree = module_body.parseString(data)
   parseTree.pprint()

prints:

   [['def',
     'A',
     ['(', 'z', ')'],
     ':',
     [['A1'], [['B', '=', '100']], [['G', '=', 'A2']], ['A2'], ['A3']]],
    'B',
    ['def',
     'BB',
     ['(', 'a', 'b', 'c', ')'],
     ':',
     [['BB1'], [['def', 'BBA', ['(', ')'], ':', [['bba1'], ['bba2'], ['bba3']]]]]],
    'C',
    'D',
    ['def',
     'spam',
     ['(', 'x', 'y', ')'],
     ':',
     [[['def', 'eggs', ['(', 'z', ')'], ':', [['pass']]]]]]]

Variables Details [hide private]

__doc__

Value:
"""
pyparsing module - Classes and methods to define and execute parsing g\
rammars
======================================================================\
=======

The pyparsing module is an alternative approach to creating and
executing simple grammars, vs. the traditional lex/yacc approach, or t\
...

singleArgBuiltins

Value:
[<built-in function sum>,
 <built-in function len>,
 <built-in function sorted>,
 <type 'reversed'>,
 <type 'list'>,
 <type 'tuple'>,
 <type 'set'>,
 <built-in function any>,
...

alphanums

Value:
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'

printables

Value:
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\\
'()*+,-./:;<=>?@[\\]^_`{|}~'

_singleChar

Value:
{W:(\, \[]-...) | Re:('\\\\0?[xX][0-9a-fA-F]+') | Re:('\\\\0[0-7]+') |\
 !W:(\])}

_charRange

Value:
Group:({{W:(\, \[]-...) | Re:('\\\\0?[xX][0-9a-fA-F]+') | Re:('\\\\0[0\
-7]+') | !W:(\])} Suppress:("-") {W:(\, \[]-...) | Re:('\\\\0?[xX][0-9\
a-fA-F]+') | Re:('\\\\0[0-7]+') | !W:(\])}})

_reBracketExpr

Value:
{"[" ["^"] Group:({{Group:({{W:(\, \[]-...) | Re:('\\\\0?[xX][0-9a-fA-\
F]+') | Re:('\\\\0[0-7]+') | !W:(\])} Suppress:("-") {W:(\, \[]-...) |\
 Re:('\\\\0?[xX][0-9a-fA-F]+') | Re:('\\\\0[0-7]+') | !W:(\])}}) | W:(\
\, \[]-...) | Re:('\\\\0?[xX][0-9a-fA-F]+') | Re:('\\\\0[0-7]+') | !W:\
(\])}}...) "]"}

alphas8bit

Value:
u'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ'

_htmlEntityMap

Value:
{'amp': '&',
 'apos': '\'',
 'gt': '>',
 'lt': '<',
 'nbsp': ' ',
 'quot': '"'}

cStyleComment

Comment of the form ``/* ... */``

Value:
C style comment

htmlComment

Comment of the form ``<!-- ... -->``

Value:
HTML comment

dblSlashComment

Comment of the form ``// ... (to end of line)``

Value:
// comment

pythonStyleComment

Comment of the form ``# ... (to end of line)``

Value:
Python style comment

commaSeparatedList

(Deprecated) Predefined expression of 1 or more printable words or quoted strings, separated by commas.

This expression is deprecated in favor of :class:`pyparsing_common.comma_separated_list`.

Value:
commaSeparatedList