Package schrodinger :: Package application :: Package desmond :: Package antlr3 :: Module recognizers :: Class Lexer
[hide private]
[frames] | no frames]

Class Lexer

    object --+    
             |    
BaseRecognizer --+
                 |
    object --+   |
             |   |
   TokenSource --+
                 |
                Lexer
Known Subclasses:

@brief Baseclass for generated lexer classes.

A lexer is recognizer that draws input symbols from a character stream. lexer grammars result in a subclass of this object. A Lexer object uses simplified match() and error recovery mechanisms in the interest of speed.

Instance Methods [hide private]
 
__init__(self, input, state=None)
x.__init__(...) initializes x; see help(type(x)) for signature
 
reset(self)
reset the parser's state; subclasses must rewinds the input stream
 
nextToken(self)
Return a token from this source; i.e., match a token on the char stream.
 
skip(self)
Instruct the lexer to skip creating a token for current lexer rule and look for another token.
 
mTokens(self)
This is the lexer entry point that sets instance var 'token'
 
setCharStream(self, input)
Set the char stream and reset the lexer
 
getSourceName(self)
 
emit(self, token=None)
The standard method called to automatically emit a token at the outermost lexical rule.
 
match(self, s)
Match current input symbol against ttype.
 
matchAny(self)
Match the wildcard: in a symbol
 
matchRange(self, a, b)
 
getLine(self)
 
getCharPositionInLine(self)
 
getCharIndex(self)
What is the index of the current character of lookahead?
 
getText(self)
Return the text matched so far for the current token or any text override.
 
setText(self, text)
Set the complete text of this token; it wipes any previous changes to the text.
 
reportError(self, e)
Report a recognition problem.
 
getErrorMessage(self, e, tokenNames)
What error message should be generated for the various exception types?
 
getCharErrorDisplay(self, c)
 
recover(self, re)
Lexers can normally match any char in it's vocabulary after matching a token, so do the easy thing and just kill a character and hope it all works out.
 
traceIn(self, ruleName, ruleIndex)
 
traceOut(self, ruleName, ruleIndex)

Inherited from BaseRecognizer: alreadyParsedRule, beginResync, combineFollows, computeContextSensitiveRuleFOLLOW, computeErrorRecoverySet, consumeUntil, displayRecognitionError, emitErrorMessage, endResync, failed, getBacktrackingLevel, getCurrentInputSymbol, getErrorHeader, getGrammarFileName, getMissingSymbol, getNumberOfSyntaxErrors, getRuleInvocationStack, getRuleMemoization, getTokenErrorDisplay, memoize, mismatchIsMissingToken, mismatchIsUnwantedToken, recoverFromMismatchedSet, recoverFromMismatchedToken, setBacktrackingLevel, setInput, toStrings

Inherited from TokenSource: __iter__, next

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Methods [hide private]

Inherited from BaseRecognizer (private): _getRuleInvocationStack

Class Variables [hide private]

Inherited from BaseRecognizer: DEFAULT_TOKEN_CHANNEL, HIDDEN, MEMO_RULE_FAILED, MEMO_RULE_UNKNOWN, antlr_version, antlr_version_str, tokenNames

Properties [hide private]
  text
Return the text matched so far for the current token or any text override.

Inherited from object: __class__

Method Details [hide private]

__init__(self, input, state=None)
(Constructor)

 

x.__init__(...) initializes x; see help(type(x)) for signature

Overrides: object.__init__
(inherited documentation)

reset(self)

 

reset the parser's state; subclasses must rewinds the input stream

Overrides: BaseRecognizer.reset
(inherited documentation)

nextToken(self)

 

Return a token from this source; i.e., match a token on the char stream.

Overrides: TokenSource.nextToken

skip(self)

 

Instruct the lexer to skip creating a token for current lexer rule and look for another token. nextToken() knows to keep looking when a lexer rule finishes with token set to SKIP_TOKEN. Recall that if token==null at end of any token rule, it creates one for you and emits it.

getSourceName(self)

 
Overrides: BaseRecognizer.getSourceName

emit(self, token=None)

 

The standard method called to automatically emit a token at the outermost lexical rule. The token object should point into the char buffer start..stop. If there is a text override in 'text', use that to set the token's text. Override this method to emit custom Token objects.

If you are building trees, then you should also override Parser or TreeParser.getMissingSymbol().

match(self, s)

 

Match current input symbol against ttype. Attempt single token insertion or deletion error recovery. If that fails, throw MismatchedTokenException.

To turn off single token insertion or deletion error recovery, override recoverFromMismatchedToken() and have it throw an exception. See TreeParser.recoverFromMismatchedToken(). This way any error in a rule will cause an exception and immediate exit from rule. Rule would recover by resynchronizing to the set of symbols that can follow rule ref.

Overrides: BaseRecognizer.match
(inherited documentation)

matchAny(self)

 

Match the wildcard: in a symbol

Overrides: BaseRecognizer.matchAny
(inherited documentation)

reportError(self, e)

 
Report a recognition problem.
    
This method sets errorRecovery to indicate the parser is recovering
not parsing.  Once in recovery mode, no errors are generated.
To get out of recovery mode, the parser must successfully match
a token (after a resync).  So it will go:

1. error occurs
2. enter recovery mode, report error
3. consume until token found in resynch set
4. try to resume parsing
5. next match() will reset errorRecovery mode

If you override, make sure to update syntaxErrors if you care about
that.

Overrides: BaseRecognizer.reportError
(inherited documentation)

getErrorMessage(self, e, tokenNames)

 

What error message should be generated for the various exception types?

Not very object-oriented code, but I like having all error message generation within one method rather than spread among all of the exception classes. This also makes it much easier for the exception handling because the exception classes do not have to have pointers back to this object to access utility routines and so on. Also, changing the message for an exception type would be difficult because you would have to subclassing exception, but then somehow get ANTLR to make those kinds of exception objects instead of the default. This looks weird, but trust me--it makes the most sense in terms of flexibility.

For grammar debugging, you will want to override this to add more information such as the stack frame with getRuleInvocationStack(e, this.getClass().getName()) and, for no viable alts, the decision description and state etc...

Override this to change the message generated for one or more exception types.

Overrides: BaseRecognizer.getErrorMessage
(inherited documentation)

recover(self, re)

 

Lexers can normally match any char in it's vocabulary after matching a token, so do the easy thing and just kill a character and hope it all works out. You can instead use the rule invocation stack to do sophisticated error recovery if you are in a fragment rule.

Overrides: BaseRecognizer.recover

traceIn(self, ruleName, ruleIndex)

 
Overrides: BaseRecognizer.traceIn

traceOut(self, ruleName, ruleIndex)

 
Overrides: BaseRecognizer.traceOut

Property Details [hide private]

text

Return the text matched so far for the current token or any text override.

Get Method:
getText(self) - Return the text matched so far for the current token or any text override.
Set Method:
setText(self, text) - Set the complete text of this token; it wipes any previous changes to the text.