Natural Language Translator
The latest computer science innovations are improving the quality of automatic translation services in terms of speed and accuracy: topologies like NMT (neural machine translation) can take advantage of newer and more efficient hardware based on CPU architectures like Intel Sky-Lake.
These technologies express their full potential when the software is fully optimized to match the hardware platform.
The team has been working and analyzing both the high-level model topology (Python code) and the low-level architecture layers, in order to optimize a neural net-based system for the inference step – the actual translation execution.
We used C++ and Assembly to optimize a network topology that translates text from one language to another. We optimized the code for the latest Intel CPU Sky-lake architecture Xeon. In order to optimize, quantization was used and a careful optimized usage of the CPU caches, taking advantage of the AVX512 hardware support. This process required a deep understanding of the hardware architecture and of the network topology.
Moreover, we started working to implement support for the next generation of CPUs, in order to achieve an even higher speed/accuracy optimization.
We created a flexible solution with multiple levels of optimization, each level representing a higher speed and lower accuracy. Users can select among multiple levels of optimization and set their speed vs. accuracy trade-off, as you can simulate below.
- from 8 seconds to 0.11 seconds for 1 sentence
- 17 levels of optimization to choose from