Funciones de bytecode no disponibles en el lenguaje Java

21 minutos de lectura

avatar de usuario
Bart van Heukelom

¿Hay actualmente (Java 6) cosas que puede hacer en el código de bytes de Java que no puede hacer desde el lenguaje Java?

Sé que ambos son Turing completos, así que lea “puede hacerlo” como “puede hacerlo significativamente más rápido/mejor, o simplemente de una manera diferente”.

Estoy pensando en códigos de bytes adicionales como invokedynamicque no se puede generar usando Java, excepto que uno específico es para una versión futura.

  • Definir “cosas”. Al final, el lenguaje Java y el código de bytes de Java están completos en Turing…

    –Michael Borgwardt

    26 de julio de 2011 a las 8:42

  • es la verdadera pregunta; ¿Hay alguna ventaja en la programación en código de bytes, por ejemplo, usando Jasmin, en lugar de Java?

    – Peter Lawrey

    26 de julio de 2011 a las 9:05

  • Me gusta rol en ensamblador, que no puede escribir en C++.

    – Martijn Courteaux

    26 de julio de 2011 a las 9:09

  • Es un compilador de optimización muy pobre que no puede compilar (x<<n)|(x>>(32-n)) a un rol instrucción.

    – Aleatorio832

    26 de julio de 2011 a las 13:36

avatar de usuario
Rafael Winterhalter

Después de trabajar con el código de bytes de Java durante bastante tiempo e investigar un poco más sobre este asunto, aquí hay un resumen de mis hallazgos:

Ejecute código en un constructor antes de llamar a un superconstructor o constructor auxiliar

En el lenguaje de programación Java (JPL), la primera declaración de un constructor debe ser una invocación de un superconstructor u otro constructor de la misma clase. Esto no es cierto para el código de bytes de Java (JBC). Dentro del código de bytes, es absolutamente legítimo ejecutar cualquier código antes que un constructor, siempre que:

  • Otro constructor compatible se llama en algún momento después de este bloque de código.
  • Esta llamada no está dentro de una declaración condicional.
  • Antes de esta llamada al constructor, no se lee ningún campo de la instancia construida y no se invoca ninguno de sus métodos. Esto implica el siguiente elemento.

Establecer campos de instancia antes de llamar a un superconstructor o constructor auxiliar

Como se mencionó anteriormente, es perfectamente legal establecer un valor de campo de una instancia antes de llamar a otro constructor. Incluso existe un truco heredado que permite explotar esta “característica” en versiones de Java anteriores a la 6:

class Foo {
  public String s;
  public Foo() {
    System.out.println(s);
  }
}

class Bar extends Foo {
  public Bar() {
    this(s = "Hello World!");
  }
  private Bar(String helper) {
    super();
  }
}

De esta forma, se podría establecer un campo antes de invocar al superconstructor, lo que, sin embargo, ya no es posible. En JBC, este comportamiento todavía se puede implementar.

Bifurcar una llamada de superconstructor

En Java, no es posible definir una llamada de constructor como

class Foo {
  Foo() { }
  Foo(Void v) { }
}

class Bar() {
  if(System.currentTimeMillis() % 2 == 0) {
    super();
  } else {
    super(null);
  }
}

Sin embargo, hasta Java 7u23, el verificador de HotSpot VM no realizó esta verificación, por lo que fue posible. Esto fue utilizado por varias herramientas de generación de código como una especie de truco, pero ya no es legal implementar una clase como esta.

Este último fue simplemente un error en esta versión del compilador. En las versiones más recientes del compilador, esto vuelve a ser posible.

Definir una clase sin ningún constructor.

El compilador de Java siempre implementará al menos un constructor para cualquier clase. En el código de bytes de Java, esto no es necesario. Esto permite la creación de clases que no se pueden construir incluso cuando se utiliza la reflexión. Sin embargo, usando sun.misc.Unsafe todavía permite la creación de tales instancias.

Definir métodos con firma idéntica pero con diferente tipo de retorno

En el JPL, un método se identifica como único por su nombre y sus tipos de parámetros sin procesar. En JBC, el tipo de devolución sin procesar también se considera.

Definir campos que no difieren por nombre sino solo por tipo

Un archivo de clase puede contener varios campos con el mismo nombre siempre que declaren un tipo de campo diferente. La JVM siempre hace referencia a un campo como una tupla de nombre y tipo.

Lanzar excepciones comprobadas no declaradas sin atraparlas

El tiempo de ejecución de Java y el código de bytes de Java no conocen el concepto de excepciones comprobadas. Es solo el compilador de Java el que verifica que las excepciones verificadas siempre se capturen o declaren si se lanzan.

Utilice la invocación de métodos dinámicos fuera de las expresiones lambda

La llamada invocación de método dinámico se puede usar para cualquier cosa, no solo para las expresiones lambda de Java. El uso de esta función permite, por ejemplo, cambiar la lógica de ejecución en tiempo de ejecución. Muchos lenguajes de programación dinámicos que se reducen a JBC mejoraron su desempeño usando esta instrucción. En el código de bytes de Java, también podría emular expresiones lambda en Java 7 donde el compilador aún no permitía ningún uso de invocación de método dinámico mientras que la JVM ya entendió la instrucción.

Usar identificadores que normalmente no se consideran legales

¿Alguna vez le ha gustado usar espacios y un salto de línea en el nombre de su método? Cree su propio JBC y buena suerte para la revisión del código. Los únicos caracteres ilegales para los identificadores son ., ;, [ and /. Additionally, methods that are not named <init> or <clinit> cannot contain < and >.

Reassign final parameters or the this reference

final parameters do not exist in JBC and can consequently be reassigned. Any parameter, including the this reference is only stored in a simple array within the JVM what allows to reassign the this reference at index 0 within a single method frame.

Reassign final fields

As long as a final field is assigned within a constructor, it is legal to reassign this value or even not assign a value at all. Therefore, the following two constructors are legal:

class Foo {
  final int bar;
  Foo() { } // bar == 0
  Foo(Void v) { // bar == 2
    bar = 1;
    bar = 2;
  }
}

For static final fields, it is even allowed to reassign the fields outside of
the class initializer.

Treat constructors and the class initializer as if they were methods

This is more of a conceptional feature but constructors are not treated any differently within JBC than normal methods. It is only the JVM’s verifier that assures that constructors call another legal constructor. Other than that, it is merely a Java naming convention that constructors must be called <init> and that the class initializer is called <clinit>. Besides this difference, the representation of methods and constructors is identical. As Holger pointed out in a comment, you can even define constructors with return types other than void or a class initializer with arguments, even though it is not possible to call these methods.

Create asymmetric records*.

When creating a record

record Foo(Object bar) { }

javac will generate a class file with a single field named bar, an accessor method named bar() and a constructor taking a single Object. Additionally, a record attribute for bar is added. By manually generating a record, it is possible to create, a different constructor shape, to skip the field and to implement the accessor differently. At the same time, it is still possible to make the reflection API believe that the class represents an actual record.

Call any super method (until Java 1.1)

However, this is only possible for Java versions 1 and 1.1. In JBC, methods are always dispatched on an explicit target type. This means that for

class Foo {
  void baz() { System.out.println("Foo"); }
}

class Bar extends Foo {
  @Override
  void baz() { System.out.println("Bar"); }
}

class Qux extends Bar {
  @Override
  void baz() { System.out.println("Qux"); }
}

it was possible to implement Qux#baz to invoke Foo#baz while jumping over Bar#baz. While it is still possible to define an explicit invocation to call another super method implementation than that of the direct super class, this does no longer have any effect in Java versions after 1.1. In Java 1.1, this behavior was controlled by setting the ACC_SUPER flag which would enable the same behavior that only calls the direct super class’s implementation.

Define a non-virtual call of a method that is declared in the same class

In Java, it is not possible to define a class

class Foo {
  void foo() {
    bar();
  }
  void bar() { }
}

class Bar extends Foo {
  @Override void bar() {
    throw new RuntimeException();
  }
}

The above code will always result in a RuntimeException when foo is invoked on an instance of Bar. It is not possible to define the Foo::foo method to invoke its own bar method which is defined in Foo. As bar is a non-private instance method, the call is always virtual. With byte code, one can however define the invocation to use the INVOKESPECIAL opcode which directly links the bar method call in Foo::foo to Foo‘s version. This opcode is normally used to implement super method invocations but you can reuse the opcode to implement the described behavior.

Fine-grain type annotations

In Java, annotations are applied according to their @Target that the annotations declares. Using byte code manipulation, it is possible to define annotations independently of this control. Also, it is for example possible to annotate a parameter type without annotating the parameter even if the @Target annotation applies to both elements.

Define any attribute for a type or its members

Within the Java language, it is only possible to define annotations for fields, methods or classes. In JBC, you can basically embed any information into the Java classes. In order to make use of this information, you can however no longer rely on the Java class loading mechanism but you need to extract the meta information by yourself.

Overflow and implicitly assign byte, short, char and boolean values

The latter primitive types are not normally known in JBC but are only defined for array types or for field and method descriptors. Within byte code instructions, all of the named types take the space 32 bit which allows to represent them as int. Officially, only the int, float, long and double types exist within byte code which all need explicit conversion by the rule of the JVM’s verifier.

Not release a monitor

A synchronized block is actually made up of two statements, one to acquire and one to release a monitor. In JBC, you can acquire one without releasing it.

Note: In recent implementations of HotSpot, this instead leads to an IllegalMonitorStateException at the end of a method or to an implicit release if the method is terminated by an exception itself.

Add more than one return statement to a type initializer

In Java, even a trivial type initializer such as

class Foo {
  static {
    return;
  }
}

is illegal. In byte code, the type initializer is treated just as any other method, i.e. return statements can be defined anywhere.

Create irreducible loops

The Java compiler converts loops to goto statements in Java byte code. Such statements can be used to create irreducible loops, which the Java compiler never does.

Define a recursive catch block

In Java byte code, you can define a block:

try {
  throw new Exception();
} catch (Exception e) {
  <goto on exception>
  throw Exception();
}

A similar statement is created implicitly when using a synchronized block in Java where any exception while releasing a monitor returns to the instruction for releasing this monitor. Normally, no exception should occur on such an instruction but if it would (e.g. the deprecated ThreadDeath), the monitor would still be released.

Call any default method

The Java compiler requires several conditions to be fulfilled in order to allow a default method’s invocation:

  1. The method must be the most specific one (must not be overridden by a sub interface that is implemented by any type, including super types).
  2. The default method’s interface type must be implemented directly by the class that is calling the default method. However, if interface B extends interface A but does not override a method in A, the method can still be invoked.

For Java byte code, only the second condition counts. The first one is however irrelevant.

Invoke a super method on an instance that is not this

The Java compiler only allows to invoke a super (or interface default) method on instances of this. In byte code, it is however also possible to invoke the super method on an instance of the same type similar to the following:

class Foo {
  void m(Foo f) {
    f.super.toString(); // calls Object::toString
  }
  public String toString() {
    return "foo";
  }
}

Access synthetic members

In Java byte code, it is possible to access synthetic members directly. For example, consider how in the following example the outer instance of another Bar instance is accessed:

class Foo {
  class Bar { 
    void bar(Bar bar) {
      Foo foo = bar.Foo.this;
    }
  }
}

This is generally true for any synthetic field, class or method.

Define out-of-sync generic type information

While the Java runtime does not process generic types (after the Java compiler applies type erasure), this information is still attcheched to a compiled class as meta information and made accessible via the reflection API.

The verifier does not check the consistency of these meta data String-encoded values. It is therefore possible to define information on generic types that does not match the erasure. As a concequence, the following assertings can be true:

Method method = ...
assertTrue(method.getParameterTypes() != method.getGenericParameterTypes());

Field field = ...
assertTrue(field.getFieldType() == String.class);
assertTrue(field.getGenericFieldType() == Integer.class);

Also, the signature can be defined as invalid such that a runtime exception is thrown. This exception is thrown when the information is accessed for the first time as it is evaluated lazily. (Similar to annotation values with an error.)

Append parameter meta information only for certain methods

The Java compiler allows for embedding parameter name and modifier information when compiling a class with the parameter flag enabled. In the Java class file format, this information is however stored per-method what makes it possible to only embed such method information for certain methods.

Mess things up and hard-crash your JVM

As an example, in Java byte code, you can define to invoke any method on any type. Usually, the verifier will complain if a type does not known of such a method. However, if you invoke an unknown method on an array, I found a bug in some JVM version where the verifier will miss this and your JVM will finish off once the instruction is invoked. This is hardly a feature though, but it is technically something that is not possible with javac compiled Java. Java has some sort of double validation. The first validation is applied by the Java compiler, the second one by the JVM when a class is loaded. By skipping the compiler, you might find a weak spot in the verifier’s validation. This is rather a general statement than a feature, though.

Annotate a constructor’s receiver type when there is no outer class

Since Java 8, non-static methods and constructors of inner classes can declare a receiver type and annotate these types. Constructors of top-level classes cannot annotate their receiver type as they most not declare one.

class Foo {
  class Bar {
    Bar(@TypeAnnotation Foo Foo.this) { }
  }
  Foo() { } // Must not declare a receiver type
}

Since Foo.class.getDeclaredConstructor().getAnnotatedReceiverType() does however return an AnnotatedType representing Foo, it is possible to include type annotations for Foo‘s constructor directly in the class file where these annotations are later read by the reflection API.

Use unused / legacy byte code instructions

Since others named it, I will include it as well. Java was formerly making use of subroutines by the JSR and RET statements. JBC even knew its own type of a return address for this purpose. However, the use of subroutines did overcomplicate static code analysis which is why these instructions are not longer used. Instead, the Java compiler will duplicate code it compiles. However, this basically creates identical logic which is why I do not really consider it to achieve something different. Similarly, you could for example add the NOOP byte code instruction which is not used by the Java compiler either but this would not really allow you to achieve something new either. As pointed out in the context, these mentioned “feature instructions” are now removed from the set of legal opcodes which does render them even less of a feature.

  • Regarding method names, you can have more than one <clinit> method by defining methods with the name <clinit> but accepting parameters or having a non-void return type. But these methods are not very useful, the JVM will ignore them and the byte code can’t invoke them. The only use would be to confuse readers.

    – Holger

    Apr 24, 2014 at 15:30

  • I just discovered, that Oracle’s JVM detects an unreleased monitor at method exit and throws an IllegalMonitorStateException if you omitted the monitorexit instruction. And in case of an exceptional method exit that failed to do a monitorexit, it resets the monitor silently.

    – Holger

    Aug 26, 2014 at 13:22

  • @Holger – did not know that, I know that this was possible in earlier JVMs at least, JRockit even has its own handler for this kind of implementation. I’ll update the entry.

    – Rafael Winterhalter

    Aug 26, 2014 at 15:54

  • Well, the JVM specification does not mandate such a behavior. I just discovered it because I tried to create a dangling intrinsic lock using such non-standard byte code.

    – Holger

    Aug 26, 2014 at 16:29

  • Ok, I found the relevant spec: “Structured locking is the situation when, during a method invocation, every exit on a given monitor matches a preceding entry on that monitor. Since there is no assurance that all code submitted to the Java Virtual Machine will perform structured locking, implementations of the Java Virtual Machine are permitted but not required to enforce both of the following two rules guaranteeing structured locking. …”

    – Holger

    Aug 26, 2014 at 16:49

As far as I know there are no major features in the bytecodes supported by Java 6 that are not also accessible from Java source code. The main reason for this is obviously that the Java bytecode was designed with the Java language in mind.

There are some features that are not produced by modern Java compilers, however:

  • The ACC_SUPER flag:

    This is a flag that can be set on a class and specifies how a specific corner case of the invokespecial bytecode is handled for this class. It is set by all modern Java compilers (where “modern” is >= Java 1.1, if I remember correctly) and only ancient Java compilers produced class files where this was un-set. This flag exists only for backwards-compatibility reasons. Note that starting with Java 7u51, ACC_SUPER is ignored completely due to security reasons.

  • The jsr/ret bytecodes.

    These bytecodes were used to implement sub-routines (mostly for implementing finally blocks). They are no longer produced since Java 6. The reason for their deprecation is that they complicate static verification a lot for no great gain (i.e. code that uses can almost always be re-implemented with normal jumps with very little overhead).

  • Having two methods in a class that only differ in return type.

    The Java language specification does not allow two methods in the same class when they differ only in their return type (i.e. same name, same argument list, …). The JVM specification however, has no such restriction, so a class file can contain two such methods, there’s just no way to produce such a class file using the normal Java compiler. There’s a nice example/explanation in this answer.

  • I could add another answer, but we might as well make yours the canonical answer. You may want to mention that a method’s signature in bytecode includes the return type. That is, you can have two methods with exactly the same parameter types, but different return types. See this discussion: stackoverflow.com/questions/3110014/is-this-valid-java/…

    – Adam Paynter

    Jul 26, 2011 at 9:09

  • You can have class, method and field names with just about any character. I worked on one project where the “fields” had spaces and hyphens in their names. 😛

    – Peter Lawrey

    Jul 26, 2011 at 9:15

  • @Peter: Speaking of file system characters, I ran into an obfuscator that had renamed a class to a and another to A inside the JAR file. It took me about half an hour of unzipping on a Windows machine before I realized where the missing classes were. 🙂

    – Adam Paynter

    Jul 27, 2011 at 9:07

  • @JoachimSauer: paraphrased JVM spec, page 75: class names, methods, fields, and local variables can contain any character except '.', ';', '[', or '/'. Method names are the same, but they also can’t contain '<' or '>'. (With the notable exceptions of <init> and <clinit> for instance and static constructors.) I should point out that if you are following the specification strictly, the class names are actually much more constrained, but the constraints are not enforced.

    – leviathanbadger

    May 4, 2013 at 3:27

  • @JoachimSauer: also, an undocumented addition of my own: the java language includes the "throws ex1, ex2, ..., exn" as part of the method signatures; you can’t add exception throwing clauses to overridden methods. BUT, the JVM couldn’t care less. So only final methods are truly guaranteed by the JVM to be exception-free – aside from RuntimeExceptions and Errors, of course. So much for checked exception handling 😀

    – leviathanbadger

    May 4, 2013 at 3:35


user avatar
Esko Luontola

Here are some features that can be done in Java bytecode but not in Java source code:

  • Throwing a checked exception from a method without declaring that the method throws it. The checked and unchecked exceptions are a thing which is checked only by the Java compiler, not the JVM. Because of this for example Scala can throw checked exceptions from methods without declaring them. Though with Java generics there is a workaround called sneaky throw.

  • Having two methods in a class that only differ in return type, as already mentioned in Joachim’s answer: The Java language specification does not allow two methods in the same class when they differ only in their return type (i.e. same name, same argument list, …). The JVM specification however, has no such restriction, so a class file can contain two such methods, there’s just no way to produce such a class file using the normal Java compiler. There’s a nice example/explanation in this answer.

  • Note that there is a way to do the first thing in Java. It’s sometimes called a sneaky throw.

    – Joachim Sauer

    Jul 26, 2011 at 9:59

  • Now that’s sneaky! 😀 Thanks for sharing.

    – Esko Luontola

    Jul 26, 2011 at 10:12

  • I think you can also use Thread.stop(Throwable) for a sneaky throw. I assume the one already linked is faster though.

    – Bart van Heukelom

    Jul 26, 2011 at 11:03


  • You can’t create an instance without calling a constructor in Java bytecode. The verifier will reject any code which tries to use an uninitialized instance. The object deserialization implementation uses native code helpers for creating instances without constructor calling.

    – Holger

    Aug 27, 2013 at 17:26

  • For a class Foo extending Object, you could not instantiate Foo by calling a constructor that is declared in Object. The verifier would refuse it. You could create such a constructor using Java’s ReflectionFactory but this hardly is a byte code feature but realized by Jni. Your answer is wrong and Holger is correct.

    – Rafael Winterhalter

    Mar 15, 2014 at 22:14


  • GOTO can be used with labels to create your own control structures (other than for while etc)
  • You can override the this local variable inside a method
  • Combining both of these you can create create tail call optimised bytecode (I do this in JCompilo)

As a related point you can get parameter name for methods if compiled with debug (Paranamer does this by reading the bytecode

Maybe section 7A in this document is of interest, although it’s about bytecode pitfalls rather than bytecode features.

  • Interesting read, but it doesn’t look like one would want to (ab)use any of those things.

    – Bart van Heukelom

    Jul 26, 2011 at 9:25

user avatar
Community

In Java language the first statement in a constructor must be a call to the super class constructor. Bytecode does not have this limitation, instead the rule is that the super class constructor or another constructor in the same class must be called for the object before accessing the members. This should allow more freedom such as:

  • Create an instance of another object, store it in a local variable (or stack) and pass it as a parameter to super class constructor while still keeping the reference in that variable for other use.
  • Call different other constructors based on a condition. This should be possible: How to call a different constructor conditionally in Java?

I have not tested these, so please correct me if I’m wrong.

  • Interesting read, but it doesn’t look like one would want to (ab)use any of those things.

    – Bart van Heukelom

    Jul 26, 2011 at 9:25

user avatar
Peter Lawrey

Something you can do with byte code, rather than plain Java code, is generate code which can loaded and run without a compiler. Many systems have JRE rather than JDK and if you want to generate code dynamically it may be better, if not easier, to generate byte code instead of Java code has to be compiled before it can be used.

  • But then you’re just skipping the compiler, not producing something that couldn’t be produced using the compiler (if it were available).

    – Bart van Heukelom

    Jul 26, 2011 at 9:16

¿Ha sido útil esta solución?

Esta web utiliza cookies propias y de terceros para su correcto funcionamiento y para fines analíticos y para mostrarte publicidad relacionada con sus preferencias en base a un perfil elaborado a partir de tus hábitos de navegación. Al hacer clic en el botón Aceptar, acepta el uso de estas tecnologías y el procesamiento de tus datos para estos propósitos. Configurar y más información
Privacidad