For a simple example if you make a simple type to store the color of a pixel
class RGB {
float r
float g
float b
...
}
and then make an `ArrayList<RGB>` the list will end up using over 2x the memory (128 bytes for object headers, 64 bytes for pointers in the list) compared to a language with value types. What makes this even worse is that since you are using a list of pointers, you can't use SIMD for any of your computations, and accessing elements will be slow since the values won't be in cache.
Just to add, the really performance oriented hot loops can be rewritten with Class-of-Arrays with three int arrays for r, g and b values, or even a single one with flattened ints, with a user-friendly wrapper RGB class provided for outside use. With good OOP-usage it would not even be ugly. Performant java code has been written like that for decades, and these will get really close to C-programs.
Also, there is now a Vector API that let’s you use SIMD operations with configurable lane width (and a safe fallback to for loops for processors without the necessary instructions)
To add to adgjlsfhk1’s answer, if you are writing Android games you are probably using a cross-platform game engine that doesn’t use Java (like Unity) which solves the issue but more importantly also allows you to sell your game on iOS without a re-write.
Look up "data oriented design" GDC talks or "AoS and SoA" for more in depth information to your question. The very brief tldr is that if you want to go fast, you have to design for cache locality and memory access patterns.
In particular for Java there's regularly dependent pointer loads, which are dreadfully slow on modern CPUs and also waste a significant amount of L1/L2 cache.
If you work on Android games for example would you get any kind of control over mem layout, irrespective of the stack you will be using?