Jetson Nano Brings AI Computing to Everyone

NVIDIA announced the Jetson Nano Developer Kit at the 2019 NVIDIA GPU Technology Conference (GTC), a $99 computer available now for embedded designers, researchers, and DIY makers, delivering the power of modern AI in a compact, easy-to-use platform with full software programmability. Jetson Nano delivers 472 GFLOPS of compute performance with a quad-core 64-bit ARM CPU and a 128-core integrated NVIDIA GPU. It also includes 4GB LPDDR4 memory in an efficient, low-power package with 5W/10W power modes and 5V DC input, as shown in figure 1.

The newly released JetPack 4.2 SDK provides a complete desktop Linux environment for Jetson Nano based on Ubuntu 18.04 with accelerated graphics, support for NVIDIA CUDA Toolkit 10.0, and libraries such as cuDNN 7.3 and TensorRT 5.The SDK also includes the ability to natively install popular open source Machine Learning (ML) frameworks such as TensorFlow, PyTorch, Caffe, Keras, and MXNet, along with frameworks for computer vision and robotics development like OpenCV and ROS.

Full compatibility with these frameworks and NVIDIA’s leading AI platform makes it easier than ever to deploy AI-based inference workloads to Jetson. Jetson Nano brings real-time computer vision and inferencing across a wide variety of complex Deep Neural Network (DNN) models. These capabilities enable multi-sensor autonomous robots, IoT devices with intelligent edge analytics, and advanced AI systems. Even transfer learning is possible for re-training networks locally onboard Jetson Nano using the ML frameworks.

The Jetson Nano Developer Kit fits in a footprint of just 80x100mm and features four high-speed USB 3.0 ports, MIPI CSI-2 camera connector, HDMI 2.0 and DisplayPort 1.3, Gigabit Ethernet, M.2 Key-E module, MicroSD card slot, and 40-pin GPIO header. The ports and GPIO header works out-of-the-box with a variety of popular peripherals, sensors, and ready-to-use projects, such as the 3D-printable deep learning JetBot that NVIDIA has open-sourced on GitHub.

The devkit boots from a removable MicroSD card which can be formatted and imaged from any PC with an SD card adapter. The devkit can be conveniently powered via either the Micro USB port or a 5V DC barrel jack adapter. The camera connector is compatible with affordable MIPI CSI sensors including modules based on the 8MP IMX219, available from Jetson ecosystem partners. Also supported is the Raspberry Pi Camera Module v2, which includes driver support in JetPack. Table 1 shows key specifications.

The devkit is built around a 260-pin SODIMM-style System-on-Module (SoM), shown in figure 2. The SoM contains the processor, memory, and power management circuitry. The Jetson Nano compute module is 45x70mm and will be shipping starting in June 2019 for $129 (in 1000-unit volume) for embedded designers to integrate into production systems. The production compute module will include 16GB eMMC onboard storage and enhanced I/O with PCIe Gen2 x4/x2/x1, MIPI DSI, additional GPIO, and 12 lanes of MIPI CSI-2 for connecting up to three x4 cameras or up to four cameras in x4/x2 configurations. Jetson’s unified memory subsystem, which is shared between CPU, GPU, and multimedia engines, provides streamlined ZeroCopy sensor ingest and efficient processing pipelines.

Deep Learning Inference Benchmarks

Jetson Nano can run a wide variety of advanced networks, including the full native versions of popular ML frameworks like TensorFlow, PyTorch, Caffe/Caffe2, Keras, MXNet, and others. These networks can be used to build autonomous machines and complex AI systems by implementing robust capabilities such as image recognition, object detection and localization, pose estimation, semantic segmentation, video enhancement, and intelligent analytics.

Figure 3 shows results from inference benchmarks across popular models available online. See here for the instructions to run these benchmarks on your Jetson Nano. The inferencing used batch size 1 and FP16 precision, employing NVIDIA’s TensorRT accelerator library included with JetPack 4.2. Jetson Nano attains real-time performance in many scenarios and is capable of processing multiple high-definition video streams.

Figure 3. Performance of various deep learning inference networks with Jetson Nano and TensorRT, using FP16 precision and batch size 1

Table 2 provides full results, including the performance of other platforms like the Raspberry Pi 3, Intel Neural Compute Stick 2, and Google Edge TPU Coral Dev Board:

DNR (did not run) results occurred frequently due to limited memory capacity, unsupported network layers, or hardware/software limitations. Fixed-function neural network accelerators often support a relatively narrow set of use-cases, with dedicated layer operations supported in hardware, with network weights and activations required to fit in limited on-chip caches to avoid significant data transfer penalties. They may fall back on the host CPU to run layers unsupported in hardware and may rely on a model compiler that supports a reduced subset of a framework (TFLite, for example).

Jetson Nano’s flexible software and full framework support, memory capacity, and unified memory subsystem, make it able to run a myriad of different networks up to full HD resolution, including variable batch sizes on multiple sensor streams concurrently. These benchmarks represent a sampling of popular networks, but users can deploy a wide variety of models and custom architectures to Jetson Nano with accelerated performance. And Jetson Nano is not just limited to DNN inferencing. Its CUDA architecture can be leveraged for computer vision and Digital Signal Processing (DSP), using algorithms including FFTs, BLAS, and LAPACK operations, along with user-defined CUDA kernels.

Multi-Stream Video Analytics

Jetson Nano processes up to eight HD full-motion video streams in real-time and can be deployed as a low-power edge intelligent video analytics platform for Network Video Recorders (NVR), smart cameras, and IoT gateways. NVIDIA’s DeepStream SDK optimizes the end-to-end inferencing pipeline with ZeroCopy and TensorRT to achieve ultimate performance at the edge and for on-premises servers. The video below shows Jetson Nano performing object detection on eight 1080p30 streams simultaneously with a ResNet-based model running at full resolution and a throughput of 500 megapixels per second (MP/s).

The block diagram in figure 4 shows an example NVR architecture using Jetson Nano for ingesting and processing up to eight digital streams over Gigabit Ethernet with deep learning analytics. The system can decode 500 MP/s of H.264/H.265 and encode 250 MP/s of H.264/H.265 video.

Figure 4. Reference NVR system architecture with Jetson Nano and 8x HD camera inputs

JetBot

NVIDIA JetBot shown in figure 5 is a new open source autonomous robotics kit that provides all the software and hardware plans to build an AI-powered deep learning robot for under $250. The hardware materials include Jetson Nano, IMX219 8MP camera, 3D-printable chassis, battery pack, motors, I2C motor driver, and accessories.

Hello AI World

Hello AI World offers a great way to start using Jetson and experiencing the power of AI. In just a couple of hours, you can have a set of deep learning inference demos up and running for real-time image classification and object detection (using pre-trained models) on the Jetson Nano Developer Kit with JetPack SDK and NVIDIA TensorRT. The tutorial focuses on networks related to computer vision and includes the use of live cameras. You also get to code your own easy-to-follow recognition program in C++. Available Deep Learning ROS Nodes integrate these recognition, detection, and segmentation inferencing capabilities with ROS for incorporation into advanced robotic systems and platforms. These real-time inferencing nodes can easily be dropped into existing ROS applications. Figure 6 highlights some examples.

Image Recognition Classification	Object Detection localization	Segmentation Free Space

Developers who want to try training their own models can follow the full “Two Days to a Demo” tutorial, which covers the re-training and customization of image classification, object detection, and semantic segmentation models with transfer learning. Transfer learning fine tunes the model weights for a particular dataset and avoids having to train the model from scratch. Transfer learning is most effectively performed on a PC or cloud instance with an NVIDIA discrete GPU attached, since training requires more computational resources and time than inferencing.

However, since Jetson Nano can run the full training frameworks like TensorFlow, PyTorch, and Caffe, it’s also able to re-train with transfer learning for those who may not have access to another dedicated training machine and are willing to wait longer for results. Table 3 highlights some initial results of transfer learning from the Two Days to a Demo tutorial with PyTorch using Jetson Nano for training Alexnet and ResNet-18 on a 200,000 image, 22.5GB subset of ImageNet:

AI for Everyone

The compute performance, compact footprint, and flexibility of Jetson Nano brings endless possibilities to developers for creating AI-powered devices and embedded systems. Get started today with the Jetson Nano Developer Kit for only $99, which will be sold through our main global distributors and can also be purchased from maker channels, Seeed Studio and SparkFun. Visit our Embedded Developer Zone to download the software and documentation, and browse open-source projects available for Jetson Nano. Join the community on the Jetson DevTalk forums for support, and be sure to share your projects. We look forward to seeing what you create!

By Dustin Franklin
https://devblogs.nvidia.com/jetson-nano-ai-computing/