org.antlr.runtime
Class BaseRecognizer

java.lang.Object
  extended by org.antlr.runtime.BaseRecognizer
Direct Known Subclasses:
Lexer, Parser, TreeParser

public abstract class BaseRecognizer
extends java.lang.Object

A generic recognizer that can handle recognizers generated from lexer, parser, and tree grammars. This is all the parsing support code essentially; most of it is error recovery stuff and backtracking.


Field Summary
protected  int _fsp
           
protected  int backtracking
          If 0, no backtracking is going on.
static int DEFAULT_TOKEN_CHANNEL
           
protected  boolean errorRecovery
          This is true when we see an error and before having successfully matched a token.
protected  boolean failed
          In lieu of a return value, this indicates that a rule or token has failed to match.
protected  BitSet[] following
          Track the set of token types that can follow any rule invocation.
static int HIDDEN
           
static int INITIAL_FOLLOW_STACK_SIZE
           
protected  int lastErrorIndex
          The index into the input stream where the last error occurred.
static int MEMO_RULE_FAILED
           
static java.lang.Integer MEMO_RULE_FAILED_I
           
static int MEMO_RULE_UNKNOWN
           
static java.lang.String NEXT_TOKEN_RULE_NAME
           
protected  java.util.Map[] ruleMemo
          An array[size num rules] of Map that tracks the stop token index for each rule.
 
Constructor Summary
BaseRecognizer()
           
 
Method Summary
 boolean alreadyParsedRule(IntStream input, int ruleIndex)
          Has this rule already parsed input at the current index in the input stream? Return the stop token index or MEMO_RULE_UNKNOWN.
 void beginResync()
          A hook to listen in on the token consumption during error recovery.
protected  BitSet combineFollows(boolean exact)
           
protected  BitSet computeContextSensitiveRuleFOLLOW()
          Compute the context-sensitive FOLLOW set for current rule.
protected  BitSet computeErrorRecoverySet()
           
 void consumeUntil(IntStream input, BitSet set)
          Consume tokens until one matches the given token set
 void consumeUntil(IntStream input, int tokenType)
           
 void displayRecognitionError(java.lang.String[] tokenNames, RecognitionException e)
           
 void emitErrorMessage(java.lang.String msg)
          Override this method to change where error messages go
 void endResync()
           
 int getBacktrackingLevel()
           
 java.lang.String getErrorHeader(RecognitionException e)
          What is the error header, normally line/character position information?
 java.lang.String getErrorMessage(RecognitionException e, java.lang.String[] tokenNames)
          What error message should be generated for the various exception types? Not very object-oriented code, but I like having all error message generation within one method rather than spread among all of the exception classes.
 java.lang.String getGrammarFileName()
          For debugging and other purposes, might want the grammar name.
 java.util.List getRuleInvocationStack()
          Return List of the rules in your parser instance leading up to a call to this method.
static java.util.List getRuleInvocationStack(java.lang.Throwable e, java.lang.String recognizerClassName)
          A more general version of getRuleInvocationStack where you can pass in, for example, a RecognitionException to get it's rule stack trace.
 int getRuleMemoization(int ruleIndex, int ruleStartIndex)
          Given a rule number and a start token index number, return MEMO_RULE_UNKNOWN if the rule has not parsed input starting from start index.
 int getRuleMemoizationCacheSize()
          return how many rule/input-index pairs there are in total.
 java.lang.String getTokenErrorDisplay(Token t)
          How should a token be displayed in an error message? The default is to display just the text, but during development you might want to have a lot of information spit out.
 java.lang.String[] getTokenNames()
          Used to print out token names like ID during debugging and error reporting.
 void match(IntStream input, int ttype, BitSet follow)
          Match current input symbol against ttype.
 void matchAny(IntStream input)
           
 void memoize(IntStream input, int ruleIndex, int ruleStartIndex)
          Record whether or not this rule parsed the input at this position successfully.
protected  void mismatch(IntStream input, int ttype, BitSet follow)
          factor out what to do upon token mismatch so tree parsers can behave differently.
protected  void pushFollow(BitSet fset)
          Push a rule's follow set using our own hardcoded stack
 void recover(IntStream input, RecognitionException re)
          Recover from an error found on the input stream.
protected  boolean recoverFromMismatchedElement(IntStream input, RecognitionException e, BitSet follow)
          This code is factored out from mismatched token and mismatched set recovery.
 void recoverFromMismatchedSet(IntStream input, RecognitionException e, BitSet follow)
           
 void recoverFromMismatchedToken(IntStream input, RecognitionException e, int ttype, BitSet follow)
          Attempt to recover from a single missing or extra token.
 void reportError(RecognitionException e)
          Report a recognition problem.
 void reset()
          reset the parser's state; subclasses must rewinds the input stream
 java.util.List toStrings(java.util.List tokens)
          A convenience method for use most often with template rewrites.
 void traceIn(java.lang.String ruleName, int ruleIndex, java.lang.Object inputSymbol)
           
 void traceOut(java.lang.String ruleName, int ruleIndex, java.lang.Object inputSymbol)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MEMO_RULE_FAILED

public static final int MEMO_RULE_FAILED
See Also:
Constant Field Values

MEMO_RULE_UNKNOWN

public static final int MEMO_RULE_UNKNOWN
See Also:
Constant Field Values

INITIAL_FOLLOW_STACK_SIZE

public static final int INITIAL_FOLLOW_STACK_SIZE
See Also:
Constant Field Values

MEMO_RULE_FAILED_I

public static final java.lang.Integer MEMO_RULE_FAILED_I

DEFAULT_TOKEN_CHANNEL

public static final int DEFAULT_TOKEN_CHANNEL
See Also:
Constant Field Values

HIDDEN

public static final int HIDDEN
See Also:
Constant Field Values

NEXT_TOKEN_RULE_NAME

public static final java.lang.String NEXT_TOKEN_RULE_NAME
See Also:
Constant Field Values

following

protected BitSet[] following
Track the set of token types that can follow any rule invocation. Stack grows upwards. When it hits the max, it grows 2x in size and keeps going.


_fsp

protected int _fsp

errorRecovery

protected boolean errorRecovery
This is true when we see an error and before having successfully matched a token. Prevents generation of more than one error message per error.


lastErrorIndex

protected int lastErrorIndex
The index into the input stream where the last error occurred. This is used to prevent infinite loops where an error is found but no token is consumed during recovery...another error is found, ad naseum. This is a failsafe mechanism to guarantee that at least one token/tree node is consumed for two errors.


failed

protected boolean failed
In lieu of a return value, this indicates that a rule or token has failed to match. Reset to false upon valid token match.


backtracking

protected int backtracking
If 0, no backtracking is going on. Safe to exec actions etc... If >0 then it's the level of backtracking.


ruleMemo

protected java.util.Map[] ruleMemo
An array[size num rules] of Map that tracks the stop token index for each rule. ruleMemo[ruleIndex] is the memoization table for ruleIndex. For key ruleStartIndex, you get back the stop token for associated rule or MEMO_RULE_FAILED. This is only used if rule memoization is on (which it is by default).

Constructor Detail

BaseRecognizer

public BaseRecognizer()
Method Detail

reset

public void reset()
reset the parser's state; subclasses must rewinds the input stream


match

public void match(IntStream input,
                  int ttype,
                  BitSet follow)
           throws RecognitionException
Match current input symbol against ttype. Upon error, do one token insertion or deletion if possible. You can override to not recover here and bail out of the current production to the normal error exception catch (at the end of the method) by just throwing MismatchedTokenException upon input.LA(1)!=ttype.

Throws:
RecognitionException

matchAny

public void matchAny(IntStream input)

mismatch

protected void mismatch(IntStream input,
                        int ttype,
                        BitSet follow)
                 throws RecognitionException
factor out what to do upon token mismatch so tree parsers can behave differently. Override this method in your parser to do things like bailing out after the first error; just throw the mte object instead of calling the recovery method.

Throws:
RecognitionException

reportError

public void reportError(RecognitionException e)
Report a recognition problem. This method sets errorRecovery to indicate the parser is recovering not parsing. Once in recovery mode, no errors are generated. To get out of recovery mode, the parser must successfully match a token (after a resync). So it will go: 1. error occurs 2. enter recovery mode, report error 3. consume until token found in resynch set 4. try to resume parsing 5. next match() will reset errorRecovery mode


displayRecognitionError

public void displayRecognitionError(java.lang.String[] tokenNames,
                                    RecognitionException e)

getErrorMessage

public java.lang.String getErrorMessage(RecognitionException e,
                                        java.lang.String[] tokenNames)
What error message should be generated for the various exception types? Not very object-oriented code, but I like having all error message generation within one method rather than spread among all of the exception classes. This also makes it much easier for the exception handling because the exception classes do not have to have pointers back to this object to access utility routines and so on. Also, changing the message for an exception type would be difficult because you would have to subclassing exception, but then somehow get ANTLR to make those kinds of exception objects instead of the default. This looks weird, but trust me--it makes the most sense in terms of flexibility. For grammar debugging, you will want to override this to add more information such as the stack frame with getRuleInvocationStack(e, this.getClass().getName()) and, for no viable alts, the decision description and state etc... Override this to change the message generated for one or more exception types.


getErrorHeader

public java.lang.String getErrorHeader(RecognitionException e)
What is the error header, normally line/character position information?


getTokenErrorDisplay

public java.lang.String getTokenErrorDisplay(Token t)
How should a token be displayed in an error message? The default is to display just the text, but during development you might want to have a lot of information spit out. Override in that case to use t.toString() (which, for CommonToken, dumps everything about the token). This is better than forcing you to override a method in your token objects because you don't have to go modify your lexer so that it creates a new Java type.


emitErrorMessage

public void emitErrorMessage(java.lang.String msg)
Override this method to change where error messages go


recover

public void recover(IntStream input,
                    RecognitionException re)
Recover from an error found on the input stream. Mostly this is NoViableAlt exceptions, but could be a mismatched token that the match() routine could not recover from.


beginResync

public void beginResync()
A hook to listen in on the token consumption during error recovery. The DebugParser subclasses this to fire events to the listenter.


endResync

public void endResync()

computeErrorRecoverySet

protected BitSet computeErrorRecoverySet()

computeContextSensitiveRuleFOLLOW

protected BitSet computeContextSensitiveRuleFOLLOW()
Compute the context-sensitive FOLLOW set for current rule. This is set of token types that can follow a specific rule reference given a specific call chain. You get the set of viable tokens that can possibly come next (lookahead depth 1) given the current call chain. Contrast this with the definition of plain FOLLOW for rule r: FOLLOW(r)={x | S=>*alpha r beta in G and x in FIRST(beta)} where x in T* and alpha, beta in V*; T is set of terminals and V is the set of terminals and nonterminals. In other words, FOLLOW(r) is the set of all tokens that can possibly follow references to r in *any* sentential form (context). At runtime, however, we know precisely which context applies as we have the call chain. We may compute the exact (rather than covering superset) set of following tokens. For example, consider grammar: stat : ID '=' expr ';' // FOLLOW(stat)=={EOF} | "return" expr '.' ; expr : atom ('+' atom)* ; // FOLLOW(expr)=={';','.',')'} atom : INT // FOLLOW(atom)=={'+',')',';','.'} | '(' expr ')' ; The FOLLOW sets are all inclusive whereas context-sensitive FOLLOW sets are precisely what could follow a rule reference. For input input "i=(3);", here is the derivation: stat => ID '=' expr ';' => ID '=' atom ('+' atom)* ';' => ID '=' '(' expr ')' ('+' atom)* ';' => ID '=' '(' atom ')' ('+' atom)* ';' => ID '=' '(' INT ')' ('+' atom)* ';' => ID '=' '(' INT ')' ';' At the "3" token, you'd have a call chain of stat -> expr -> atom -> expr -> atom What can follow that specific nested ref to atom? Exactly ')' as you can see by looking at the derivation of this specific input. Contrast this with the FOLLOW(atom)={'+',')',';','.'}. You want the exact viable token set when recovering from a token mismatch. Upon token mismatch, if LA(1) is member of the viable next token set, then you know there is most likely a missing token in the input stream. "Insert" one by just not throwing an exception.


combineFollows

protected BitSet combineFollows(boolean exact)

recoverFromMismatchedToken

public void recoverFromMismatchedToken(IntStream input,
                                       RecognitionException e,
                                       int ttype,
                                       BitSet follow)
                                throws RecognitionException
Attempt to recover from a single missing or extra token. EXTRA TOKEN LA(1) is not what we are looking for. If LA(2) has the right token, however, then assume LA(1) is some extra spurious token. Delete it and LA(2) as if we were doing a normal match(), which advances the input. MISSING TOKEN If current token is consistent with what could come after ttype then it is ok to "insert" the missing token, else throw exception For example, Input "i=(3;" is clearly missing the ')'. When the parser returns from the nested call to expr, it will have call chain: stat -> expr -> atom and it will be trying to match the ')' at this point in the derivation: => ID '=' '(' INT ')' ('+' atom)* ';' ^ match() will see that ';' doesn't match ')' and report a mismatched token error. To recover, it sees that LA(1)==';' is in the set of tokens that can follow the ')' token reference in rule atom. It can assume that you forgot the ')'.

Throws:
RecognitionException

recoverFromMismatchedSet

public void recoverFromMismatchedSet(IntStream input,
                                     RecognitionException e,
                                     BitSet follow)
                              throws RecognitionException
Throws:
RecognitionException

recoverFromMismatchedElement

protected boolean recoverFromMismatchedElement(IntStream input,
                                               RecognitionException e,
                                               BitSet follow)
This code is factored out from mismatched token and mismatched set recovery. It handles "single token insertion" error recovery for both. No tokens are consumed to recover from insertions. Return true if recovery was possible else return false.


consumeUntil

public void consumeUntil(IntStream input,
                         int tokenType)

consumeUntil

public void consumeUntil(IntStream input,
                         BitSet set)
Consume tokens until one matches the given token set


pushFollow

protected void pushFollow(BitSet fset)
Push a rule's follow set using our own hardcoded stack


getRuleInvocationStack

public java.util.List getRuleInvocationStack()
Return List of the rules in your parser instance leading up to a call to this method. You could override if you want more details such as the file/line info of where in the parser java code a rule is invoked. This is very useful for error messages and for context-sensitive error recovery.


getRuleInvocationStack

public static java.util.List getRuleInvocationStack(java.lang.Throwable e,
                                                    java.lang.String recognizerClassName)
A more general version of getRuleInvocationStack where you can pass in, for example, a RecognitionException to get it's rule stack trace. This routine is shared with all recognizers, hence, static. TODO: move to a utility class or something; weird having lexer call this


getBacktrackingLevel

public int getBacktrackingLevel()

getTokenNames

public java.lang.String[] getTokenNames()
Used to print out token names like ID during debugging and error reporting. The generated parsers implement a method that overrides this to point to their String[] tokenNames.


getGrammarFileName

public java.lang.String getGrammarFileName()
For debugging and other purposes, might want the grammar name. Have ANTLR generate an implementation for this method.


toStrings

public java.util.List toStrings(java.util.List tokens)
A convenience method for use most often with template rewrites. Convert a List to List


getRuleMemoization

public int getRuleMemoization(int ruleIndex,
                              int ruleStartIndex)
Given a rule number and a start token index number, return MEMO_RULE_UNKNOWN if the rule has not parsed input starting from start index. If this rule has parsed input starting from the start index before, then return where the rule stopped parsing. It returns the index of the last token matched by the rule. For now we use a hashtable and just the slow Object-based one. Later, we can make a special one for ints and also one that tosses out data after we commit past input position i.


alreadyParsedRule

public boolean alreadyParsedRule(IntStream input,
                                 int ruleIndex)
Has this rule already parsed input at the current index in the input stream? Return the stop token index or MEMO_RULE_UNKNOWN. If we attempted but failed to parse properly before, return MEMO_RULE_FAILED. This method has a side-effect: if we have seen this input for this rule and successfully parsed before, then seek ahead to 1 past the stop token matched for this rule last time.


memoize

public void memoize(IntStream input,
                    int ruleIndex,
                    int ruleStartIndex)
Record whether or not this rule parsed the input at this position successfully. Use a standard java hashtable for now.


getRuleMemoizationCacheSize

public int getRuleMemoizationCacheSize()
return how many rule/input-index pairs there are in total. TODO: this includes synpreds. :(


traceIn

public void traceIn(java.lang.String ruleName,
                    int ruleIndex,
                    java.lang.Object inputSymbol)

traceOut

public void traceOut(java.lang.String ruleName,
                     int ruleIndex,
                     java.lang.Object inputSymbol)