How NVIDIA uses Machine Learning and what is their future plan!?

If we see the history of GPU’s, NVIDIA AI is the one who has in the true sense introduced the modern-day graphical processing unit. We all are familiar with the recent acquisition of arm, done by Nvidia. They are planning to use the well-established market place of the arm to deploy their powerful GPU based AI aid. By doing this, they will bring AI from the cloud to between the people.

In this article, we will see how NVIDIA AI uses a machine learning approach into the processor by one of their product RAPIDS.

NVIDIA AI provides a well maintained and efficient bundle of machine learning and analysis software modules/libraries to accelerate the end-to-end data science pipeline on top of GPU. The backbone of developing such a complex and efficient tool in their 15 years of hard work in developing CUDA.

CUDA: It stands for Compute Unified Device Architecture, it is designed to implement it as a parallel computing platform and API module for the developers to use it easily. It enables developers to use the GPU for general-purpose programs and the benefit is, it is more powerful in terms of computing. This approach is termed as GPGPU(General Purpose Graphical Processing Unit). The reason it's powerful is, it allows direct access to the GPU’s virtual instruction set and parallel computational elements, for the execution of compute kernels.

Process flow on CUDA

Source: Wikipedia
  1. CPU will send the instruction to initiate the GPU compute kernel.
  2. GPU’s CUDA core will execute the kernel process in parallel.
  3. Copying resulting data into the main memory.


It is an open-source suite of libraries and APIs that gives us the ability to execute end to end data science and analysis pipelines entirely on GPUs. RAPIDS uses Apache Arrow for language-independent columnar memory format for flat and hierarchical data, organized for efficient analytical operations on modern hardware(like CPUs and GPUs) and NVIDIA CUDA primitives for low level compute optimization and extensively support GPU parallelism and high bandwidth memory speed through user-friendly python interfaces.

Architecture of RAPIDS

Source: Developer NVIDIA Blog

This is called the data science pipeline. In the industrial world, it’s not just AI/ML or Operations working in singularity, for achieving agility or less time-to-market we have to follow the culture of DevOps where we create pipelines or flow of jobs/task which are automated and can be handled by the team individually as well as by a single team with all type of capable engineers. RAPIDS also adopts this kind of culture through which they have adopted to this pipeline or flow.

In this architecture, we have data preparation then model training, and finally visualization. To show this flow with more insights:


Data preparation: If we want to understand this point in terms of a data scientist or AI/ML engineer, it's called the data collection/extraction part of the flow. We can use ETL tools to fetch the data from the data lake or warehouse or even from some other source. ETL refers to Extract-Transform-Load, one such example of this tool is ELK Stack (Elasticsearch, Logstash, Kibana). Using the ELT tool we can extract data from a source then perform operations on it to make data suitable for doing analysis or analytics on it, it’s termed as transform. Finally, we will load it in some other place from where we can take the data further for analysis or analytics.

Model Training: In this phase, the data which we collected from the data preparation needs to prepare, we can say, we are doing feature engineering, feature extraction, and feature selection. After we have the exact data for training the model, we will start the model training.

Visualization: From the model trained, we will plot graphs of predicted values, it’s called visualization. Finally, we have got the correct % of accuracy and insights of the data, we will pass this model to be used in the production environment, it’s called deployment.

This complete flow is backed by the CUDA and this complete pipeline is working on top of Apache Arrow.

cuDF: A GPU DataFrame library with a pandas-like API. cuDF provides operations on data columns including unary and binary operations, filters, joins, and groupbys. cuDF currently comprises the Python library PyGDF, and the C++/CUDA GPU DataFrames implementation in libgdf. These two libraries are being merged into cuDF.

cuML: C++/CUDA machine learning algorithms. This library currently includes the following five algorithms; a. Single GPU Truncated Singular Value Decomposition (tSVD). b. Single GPU Principal Component Analysis (PCA). c. Single GPU Density-based Spatial Clustering of Applications with Noise (DBSCAN). d. Single GPU Kalman Filtering e. Multi-GPU K-Means Clustering

cuGraph: a collection of graph analytics libraries that seamlessly integrate into the RAPIDS data science platform.

cuDNN: a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.

RTX: RTX platform includes dedicated RT Cores for ray tracing and Tensor Cores for AI that enable groundbreaking technologies at breakthrough speed.

There are some more libraries which boosts the performance.

cuSKL: a suite of libraries that implements machine learning algorithms within the RAPIDS data science ecosystem. cuSKL enables data scientists, researchers, and software engineers to run traditional ML tasks on GPUs without going into the details of CUDA programming.

XGBoost: an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. The same code runs on a major distributed environment (Kubernetes, Hadoop, SGE, MPI, Dask) and can solve problems beyond billions of examples.

To summarize this:

  • MACHINE LEARNING: cuML is a collection of GPU-accelerated machine learning libraries that will provide GPU versions of all machine learning algorithms available in scikit-learn.
  • GRAPH ANALYTICS: cuGRAPH is a collection of graph analytics libraries that seamlessly integrate into the RAPIDS data science platform.

In nutshell, we can say using them reduces the time which usually is wasted waiting for results and re-training the model for better accuracy after changing the hyper-parameters. We can visualize this using this simple image provided by NVIDIA Developers.


Future plan:

As I have mentioned earlier in this article and, I have also written an article on how the arm is using Amazon Web Services AWS about the future goal of NVIDIA AI. They want to deploy the GPU accelerated AI/ML programs on top of arm chips which have already captured a great market. In doing so, AI/ML from the cloud will come amongst the people.



C|EH | Cybersecurity researcher | MLOps | Hybrid Multi Cloud | Devops assembly line | Openshift | AWS EKS | Docker

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store

C|EH | Cybersecurity researcher | MLOps | Hybrid Multi Cloud | Devops assembly line | Openshift | AWS EKS | Docker