Julia Language and Its Deep Learning Ecosystem
Table of Contents
I heard about Swift and Julia for deep learning during fast.ai's course. Then I spent quite some time to try figuring out whether any of these languages is a better choice for deep learning than Python. However, soon I realized that Chris Lattner left Google and Swift for Tensorflow project is slowing down and even Jeremy Howard is less passionate about it. Then the only choice left for now is Julia AFAIK. The Question is whether Julia is ready for deep learning.
Memory Management
Garbage Collector
I noticed this issue when Chris Lattner pointed during a debate. In short, Julia uses the generational mark-and-sweep Garbage Collector. It may cause memory not being released in time. This is especially problematic for GPU since its memory is much smaller than CPU memory. The good news is there is some workaround, but it would be better if Julia has some deterministic memory management. There is an ongoing discussion about this.
Heap or Stack
As a high level language, Julia also does not allow the programmer to explicitly control whether the memory should be allocated on stack or heap. Instead, it is up to the complier to optimize for you.
Speed
I guess most people come to Julia for speed. However, when optimizing your code for speed, it is not that straightforward.
Avoid allocation
One problem is to avoid heap allocation. As I mentioned in the previous section. Heap or stack allocation cannot be controlled explicitly. Then it is kind like play a guessing game with the compiler to avoid allocation. The good news is, there are some performance tips.
Type Stable
Another problem is type inference. Julia is a dynamic typing language; however, to get the best speed, you are encouraged to have type stability. My understanding of this is as following: You can you Julia just a dynamic type language (such as Python), and have all the easy-of-mind and fast-prototyping. However, when you need it to be as fast as possible, you can use Julia as if it is a static type language, take care of the type stability (kind of like the generic type in a static type language like C++). However, the only con I see is that unlike a static type language that does type check at compile time and yell at you when it fails at type check. You need to call a few macros yourself to check whether it is type stable.
Type Annotation
Note that type annotation does not help if the compiler can figure out a stable type. In that case, annotating it to a narrower type only makes your function less reusable.
Multiple Dispatch
As a rule of thumb, compile time multiple dispatch is as fast as C/C++, Julia can do all the optimizations like inlining. However, runtime multiple dispatch needs to do a lookup and it is slow (can be slower than runtime if-else check). but runtime-multiple dispatch is much more powerful than the single/double dispatch in C++/Python.
Compile-time dispatch
when the type is known at compile time, this feature is typical in a static type language (C++ for example), it is called function overloading. Julia and C++ can both do heavy optimization.
Runtime dispatch
When it is not known at compile time. C++ can do dynamic single dispatch with virtual table. C++ can also do runtime double dispatch with a design pattern. And I don't hear people mention runtime mutiple (>2) dispatch in C++, but I think it is possible by implementing some kind of virtual table yourself. For this runtime/dynamic multiple dispatch case, I think Julia does a lookup at runtimes. So it is kind of similar to the vtable idea, but the implementation of the lookup is different. "when types can't be predicted in advance, you're better off doing more at runtime and less via the type system. Apparently, Tim Holy is saying the runtime multiple dispatch can be slower than just doing the conditional check at runtime.
Ahead of Time (AOT) Compilation vs. Just In Time (JIT) Compilation vs. Interpreter
Usually, static type languages are AOT compiled while dynamic language runs on an interpreter, and sometimes with a JIT compiler. Because you need to know the type to deeply optimize the code, static type languages can directly be optimized AOT, while dynamic typing languages can only to optimized at runtime when the type is known with a JIT. Julia is somewhere in between. It is just ahead of time (JAOT) compiled. Julia does type inference as early as it can. But for a dynamic type language, inevitably, the type cannot be inferred before runtime. The pro is that you have a deeply optimized dynamic typing language. The con is this JAOT takes a long time, and this is the well known time to first plot issue. This issue is not a big deal if you use Julia for deep learning. However, dynamic typing is something built in the design of the language, while you enjoy the ease of use, you also have to bear with the awkwardness it brings: 1) long JAOT compilation time; 2) hard to build into libraries and executables. Luckily, both of these two are being actively worked on (PackageCompiler.jl, StaticCompiler.jl) but fully ready yet.
Community
Julia has a large scientific computing community. For deep learning, there is also a community working on the machine learning ecosystem.
Solve Two Language Problem
I think this is a pretty unique point about Julia. People are proud of using Julia from high-level all the way to low-level programming. The fact that Julia can be used to do CUDA programming just blew my mind. Keep in mind that when you get into the advanced metaprogramming in Julia, it feels quite different compared to the part of Julia you normally use but it is technically still Julia language.
Deep Learning Library and Auto Differentiation (AD)
I guess a lot of people care about this the most. The reason I put it at the very end is that it is not very mature yet. The most promising deep learning library is called Flux.jl and the AD it uses is called Zygote.jl. They are still undergoing some changes. As far as I know, there may be a new AD coming out to replace Zygote. Unlike Pytorch, where AD only works on Pytorch's Tensor class, Zygote is a source to source AD that does not need to introduce another Tensor class, it just works on the existing data structure. This can be a huge advantage for developers. In Pytorch ecosystem, to be differentiable, developers need to re-write their libraries using Pytorch's Tensor. For example, Kornia is a re-write of OpenCV using Pytorch. This will not happen to Julia package developer since the AD will just work.
To summary up, IMHO, this is a very good time for early adopters and people who want to make contributions. However, if you just want to get things done, or you want the current best (fast and fully functional) solution, Julia is not ready yet.