Byte code generation to support a DSL - 101
As part of my investigations into the benefits of model driven techniques and DSLs I’ve been looking into some Java byte-code generation mechanisms. After I’d managed to get my masochistic hacker tendencies under control, I decided not to write a whole new class file generation framework myself and instead downloaded ASM and BCEL.
Essentially, I wanted to be able to generate Java classes on the fly from a DSL, and because I wanted to look at both ASM and BCEL, I setup the basics of my scenario to allow different class generators to be plugged in.
To keep things simple for my simple 101 scenario, I used postfix notation as my DSL; yeh ok, so it’s not really a DSL as such - but the principles are the same and I didn’t want to get a more complex grammar screwing up my investigations into code generation.
The key points are that the generated classes will all implement a normally compiled Java interface, which contains a single, yet important method int eval(). It’s this method that will be called by the client code (or for me, test cases) to perform the necessary logic. The specific postfix strings actually drive the generation mechanism rather than being parsed and used at runtime. To facilitate the loading of these dynamic classes, I have created a new class loader that will delegate getting the class’s bytes from the generator for dynamic classes. To drive the decision process (as to whether a class is dynamic or not) I am using properties, where the key is the class name and the value is the actual postfix expression to be evaluated. For example, for some of my test cases I have the following sample.postfix.properties file :
postfix.A=2 3 +
postfix.B=2 5 + 1 -
postfix.C=3 5 + 5 %
postfix.D=6 7 * 6 - 2 /
postfix.E=6 7 * 3 2 * /
So loading the class “postfix.A” will create an implementation that evaluates “2 3 +” (i.e. resulting in 5) and loading “postfix.E” will give the result of “6 7 * 3 2 * /” (i.e. 7).
To give an idea of what we’re trying to generate, the postfix expression “2 3 +” should generate byte code similar to this for the eval method :
public int eval();
Code:
0: ldc #13; //int 2
2: ldc #14; //int 3
4: iadd
5: ireturn
… and the postfix expression “6 7 * 3 2 * /” should generate byte code similar to this :
public int eval();
Code:
0: ldc #13; //int 6
2: ldc #14; //int 7
4: imul
5: ldc #15; //int 3
7: ldc #16; //int 2
9: imul
10: idiv
11: ireturn
[For reference, the above byte code dumps were performed using the standard Java javap tool]
To keep things a little more brief, rather than pasting some of the actual code, which includes tokenizing strings and looping, I’ll just put in some code samples to show the “2 3 +” scenario for each style of generator implementation I created. The initial ASM method that I used provided code like this :
MethodVisitor mv = cw.visitMethod(ACC_PUBLIC, “eval”, “()I”, null, null);
mv.visitCode();
mv.visitLdcInsn(new Integer(2));
mv.visitLdcInsn(new Integer(3));
mv.visitInsn(IADD);
mv.visitInsn(IRETURN);
mv.visitMaxs(0, 0); // this is calc’d for me
mv.visitEnd();
There is an alternative, and easier to read syntax available in ASM. The documentation states that it have poorer performance but that’s not something I’m anywhere close to being able to verify :
Method m = Method.getMethod(”int eval ()”);
GeneratorAdapter mg = new GeneratorAdapter(ACC_PUBLIC, m, null, null, classWriter);
mg.push(2);
mg.push(3);
mg.math(GeneratorAdapter.ADD, Type.INT_TYPE);
mg.returnValue();
mg.endMethod();
Finally, the BCEL code looks like this :
InstructionList il = new InstructionList();
String[] argNames = {};
String methodName = “eval”;
String className = cg.getClassName();MethodGen method = new MethodGen(ACC_PUBLIC, Type.INT, Type.NO_ARGS, argNames, methodName, className, il, cp);
il.append(new PUSH(cp, 2));
il.append(new PUSH(cp, 3));
il.append(InstructionConstants.IADD);
il.append(factory.createReturn(Type.INT));method.setMaxStack();
method.setMaxLocals();cg.addMethod(method.getMethod());
il.dispose();
I’d intially thought that I was going to prefer BCEL as a method for doing this but I was wrong; and it all comes down to tooling and documentation. ASM has an Eclipse plugin that’s really helpful in showing the byte code for the classes in the workspace; and can also show the ASM code to produce those classes. BCEL has the BCELifier utility that does the same, but it’s not integrated into the IDE and is something I had to use (to get the BCEL code for one of my ASM generated classes) because the BCEL documentation is a little lacking.
The conclusion for me is that they’re both quite easy to use and on-the-fly class generation is certainly made simpler using these libraries. I’m likely to use ASM more that BCEL right now, but I think it’s going to be essential to keep an eye on the actual support for more complex aspects of code generation, which is going to become more critical as my DSLs become more real world.
[The actual source for the scenario I’m referring to in this post can be downloaded here]
October 3rd, 2007 at 11:15 am
Ah… so now I know what you’ve been doing besides the gardening!!!