Package schrodinger :: Package application :: Package desmond :: Package antlr3 :: Module streams :: Class TokenRewriteStream
[hide private]
[frames] | no frames]

Class TokenRewriteStream

object --+            
         |            
 IntStream --+        
             |        
   TokenStream --+    
                 |    
 CommonTokenStream --+
                     |
                    TokenRewriteStream

@brief CommonTokenStream that can be modified.

Useful for dumping out the input stream after doing some
augmentation or other manipulations.

You can insert stuff, replace, and delete chunks.  Note that the
operations are done lazily--only if you convert the buffer to a
String.  This is very efficient because you are not moving data around
all the time.  As the buffer of tokens is converted to strings, the
toString() method(s) check to see if there is an operation at the
current index.  If so, the operation is done and then normal String
rendering continues on the buffer.  This is like having multiple Turing
machine instruction streams (programs) operating on a single input tape. :)

Since the operations are done lazily at toString-time, operations do not
screw up the token index values.  That is, an insert operation at token
index i does not change the index values for tokens i+1..n-1.

Because operations never actually alter the buffer, you may always get
the original token stream back without undoing anything.  Since
the instructions are queued up, you can easily simulate transactions and
roll back any changes if there is an error just by removing instructions.
For example,

 CharStream input = new ANTLRFileStream("input");
 TLexer lex = new TLexer(input);
 TokenRewriteStream tokens = new TokenRewriteStream(lex);
 T parser = new T(tokens);
 parser.startRule();

 Then in the rules, you can execute
    Token t,u;
    ...
    input.insertAfter(t, "text to put after t");}
    input.insertAfter(u, "text after u");}
    System.out.println(tokens.toString());

Actually, you have to cast the 'input' to a TokenRewriteStream. :(

You can also have multiple "instruction streams" and get multiple
rewrites from a single pass over the input.  Just name the instruction
streams and use that name again when printing the buffer.  This could be
useful for generating a C file and also its header file--all from the
same buffer:

    tokens.insertAfter("pass1", t, "text to put after t");}
    tokens.insertAfter("pass2", u, "text after u");}
    System.out.println(tokens.toString("pass1"));
    System.out.println(tokens.toString("pass2"));

If you don't use named rewrite streams, a "default" stream is used as
the first example shows.

Instance Methods [hide private]
 
__init__(self, tokenSource=None, channel=0)
@param tokenSource A TokenSource instance (usually a Lexer) to pull the tokens from.
 
rollback(self, *args)
Rollback the instruction stream for a program so that the indicated instruction (via instructionIndex) is no longer in the stream.
 
deleteProgram(self, programName='default')
Reset the program so that no instructions exist
 
insertAfter(self, *args)
 
insertBefore(self, *args)
 
replace(self, *args)
 
delete(self, *args)
 
getLastRewriteTokenIndex(self, programName='default')
 
setLastRewriteTokenIndex(self, programName, i)
 
getProgram(self, name)
 
initializeProgram(self, name)
 
toOriginalString(self, start=None, end=None)
 
toString(self, *args)
Return the text of all tokens from start to stop, inclusive.
 
__str__(self, *args)
Return the text of all tokens from start to stop, inclusive.
 
reduceToSingleOperationPerIndex(self, rewrites)
We need to combine operations and report invalid operations (like overlapping replaces that are not completed nested).
 
catOpText(self, a, b)
 
getKindOfOps(self, rewrites, kind, before=None)
 
toDebugString(self, start=None, end=None)

Inherited from CommonTokenStream: LA, LB, LT, consume, discardTokenType, fillBuffer, get, getSourceName, getTokenSource, getTokens, index, mark, release, reset, rewind, seek, setTokenSource, setTokenTypeChannel, size, skipOffTokenChannels, skipOffTokenChannelsReverse

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __subclasshook__

Class Variables [hide private]
  DEFAULT_PROGRAM_NAME = 'default'
  MIN_TOKEN_INDEX = 0
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, tokenSource=None, channel=0)
(Constructor)

 

@param tokenSource A TokenSource instance (usually a Lexer) to pull
    the tokens from.

@param channel Skip tokens on any channel but this one; this is how we
    skip whitespace...
    

Overrides: object.__init__
(inherited documentation)

rollback(self, *args)

 

Rollback the instruction stream for a program so that the indicated instruction (via instructionIndex) is no longer in the stream. UNTESTED!

toString(self, *args)

 

Return the text of all tokens from start to stop, inclusive. If the stream does not buffer all the tokens then it can just return "" or null; Users should not access $ruleLabel.text in an action of course in that case.

Because the user is not required to use a token with an index stored in it, we must provide a means for two token objects themselves to indicate the start/end location. Most often this will just delegate to the other toString(int,int). This is also parallel with the TreeNodeStream.toString(Object,Object).

Overrides: TokenStream.toString
(inherited documentation)

__str__(self, *args)
(Informal representation operator)

 

Return the text of all tokens from start to stop, inclusive. If the stream does not buffer all the tokens then it can just return "" or null; Users should not access $ruleLabel.text in an action of course in that case.

Because the user is not required to use a token with an index stored in it, we must provide a means for two token objects themselves to indicate the start/end location. Most often this will just delegate to the other toString(int,int). This is also parallel with the TreeNodeStream.toString(Object,Object).

Overrides: object.__str__
(inherited documentation)

reduceToSingleOperationPerIndex(self, rewrites)

 

We need to combine operations and report invalid operations (like
overlapping replaces that are not completed nested).  Inserts to
same index need to be combined etc...   Here are the cases:

I.i.u I.j.v                           leave alone, nonoverlapping
I.i.u I.i.v                           combine: Iivu

R.i-j.u R.x-y.v | i-j in x-y          delete first R
R.i-j.u R.i-j.v                       delete first R
R.i-j.u R.x-y.v | x-y in i-j          ERROR
R.i-j.u R.x-y.v | boundaries overlap  ERROR

I.i.u R.x-y.v   | i in x-y            delete I
I.i.u R.x-y.v   | i not in x-y        leave alone, nonoverlapping
R.x-y.v I.i.u   | i in x-y            ERROR
R.x-y.v I.x.u                         R.x-y.uv (combine, delete I)
R.x-y.v I.i.u   | i not in x-y        leave alone, nonoverlapping

I.i.u = insert u before op @ index i
R.x-y.u = replace x-y indexed tokens with u

First we need to examine replaces.  For any replace op:

  1. wipe out any insertions before op within that range.
  2. Drop any replace op before that is contained completely within
     that range.
  3. Throw exception upon boundary overlap with any previous replace.

Then we can deal with inserts:

  1. for any inserts to same index, combine even if not adjacent.
  2. for any prior replace with same left boundary, combine this
     insert with replace and delete this replace.
  3. throw exception if index in same range as previous replace

Don't actually delete; make op null in list. Easier to walk list.
Later we can throw as we add to index -> op map.

Note that I.2 R.2-2 will wipe out I.2 even though, technically, the
inserted stuff would be before the replace range.  But, if you
add tokens in front of a method body '{' and then delete the method
body, I think the stuff before the '{' you added should disappear too.

Return a map from token index to operation.