<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[The Hypervisor: Hardware]]></title><description><![CDATA[A series of posts on the topic of Hardware]]></description><link>https://www.thehypervisor.blog/s/firmware-device-drivers</link><image><url>https://substackcdn.com/image/fetch/$s_!eCSK!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf7f1646-9248-40c8-a717-6f1ed8bb8d61_800x800.png</url><title>The Hypervisor: Hardware</title><link>https://www.thehypervisor.blog/s/firmware-device-drivers</link></image><generator>Substack</generator><lastBuildDate>Tue, 30 Jun 2026 05:40:36 GMT</lastBuildDate><atom:link href="https://www.thehypervisor.blog/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Matthew Leone]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[thehypervisor@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[thehypervisor@substack.com]]></itunes:email><itunes:name><![CDATA[Matthew Leon]]></itunes:name></itunes:owner><itunes:author><![CDATA[Matthew Leon]]></itunes:author><googleplay:owner><![CDATA[thehypervisor@substack.com]]></googleplay:owner><googleplay:email><![CDATA[thehypervisor@substack.com]]></googleplay:email><googleplay:author><![CDATA[Matthew Leon]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Running your first CUDA kernel]]></title><description><![CDATA[Introduction:]]></description><link>https://www.thehypervisor.blog/p/running-your-first-cuda-kernel</link><guid isPermaLink="false">https://www.thehypervisor.blog/p/running-your-first-cuda-kernel</guid><dc:creator><![CDATA[Matthew Leon]]></dc:creator><pubDate>Fri, 08 Aug 2025 01:32:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!pmz0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0437c4a2-7875-4a0d-8234-8297c0772aff_1675x1075.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Introduction:</h1><p>CUDA, short for Compute Unified Device Architecture, is Nvidia's parallel computing platform that allows developers and engineers to build incredibly powerful computation systems and programs through the use of GPUs, known as Graphics Processing Units. With the development of CUDA, developers can now easily use their Nvidia GPUs for computationally-expensive tasks.</p><p>Difference between GPU and CPU for parallel processing</p><p>You might be asking yourself why CUDA and GPU programming matters. The reason why is because while CPUs are great for general-purpose computing, GPUs are good for computationally-expensive workloads. If you need to programmatically add 2 very large vectors or multiply matrices, a modern GPU can handle the operations in parallel whereas a CPU would need to process the operations serially albeit in parallel if the processing is multi-threaded.</p><p>This post serves as a basic tutorial for getting CUDA running on a GPU and running a custom built kernel that will add 2 vectors.</p><p>For this exercise, the readers should have a somewhat modern GPU. An RTX 5050 or even a GTX 1080 will work fine in a 22.04 version of Ubuntu Linux.</p><p>For this guide we are using Ubuntu 22.04. If you have a different version of Ubuntu or distribution of Linux, you will need to adjust your installation step.</p><h2>Installing Nvidia drivers and CUDA:</h2><p>The first thing that you should do on your ubuntu 22.04 machine is install the Nvidia drivers and CUDA packages so that your programs can dispatch workloads to the GPU.<br><br>This can be done by running the following commands. If you run something besides Ubuntu 22.04, then adjust the commands to whatever version of Ubuntu or other distribution that you are running.</p><pre><code><code>wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update

sudo apt install cuda

sudo reboot
</code></code></pre><p>Once this runs and the machine reboots, run the following:</p><pre><code>nvidia-smi

nvcc --version
</code></pre><p>You should get an output that looks similar to the following:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pmz0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0437c4a2-7875-4a0d-8234-8297c0772aff_1675x1075.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pmz0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0437c4a2-7875-4a0d-8234-8297c0772aff_1675x1075.png 424w, https://substackcdn.com/image/fetch/$s_!pmz0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0437c4a2-7875-4a0d-8234-8297c0772aff_1675x1075.png 848w, https://substackcdn.com/image/fetch/$s_!pmz0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0437c4a2-7875-4a0d-8234-8297c0772aff_1675x1075.png 1272w, https://substackcdn.com/image/fetch/$s_!pmz0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0437c4a2-7875-4a0d-8234-8297c0772aff_1675x1075.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pmz0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0437c4a2-7875-4a0d-8234-8297c0772aff_1675x1075.png" width="1456" height="934" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0437c4a2-7875-4a0d-8234-8297c0772aff_1675x1075.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:934,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:112798,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thehypervisor.blog/i/170409701?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0437c4a2-7875-4a0d-8234-8297c0772aff_1675x1075.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pmz0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0437c4a2-7875-4a0d-8234-8297c0772aff_1675x1075.png 424w, https://substackcdn.com/image/fetch/$s_!pmz0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0437c4a2-7875-4a0d-8234-8297c0772aff_1675x1075.png 848w, https://substackcdn.com/image/fetch/$s_!pmz0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0437c4a2-7875-4a0d-8234-8297c0772aff_1675x1075.png 1272w, https://substackcdn.com/image/fetch/$s_!pmz0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0437c4a2-7875-4a0d-8234-8297c0772aff_1675x1075.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In my case, I have an RTX 3090 installed.</p><p></p><h2>Running the add vectors CUDA kernel:</h2><p>In a add_vectors.cu file, insert the following</p><pre><code>#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;cuda_runtime.h&gt;

__global__ void add(float *a, float *b, float *c, int n) {
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i &lt; n) c[i] = a[i] + b[i];
}

void check(cudaError_t err, const char *msg) {
    if (err != cudaSuccess) {
        printf("Error: %s - %s\n", msg, cudaGetErrorString(err));
        exit(1);
    }
}

int main() {
    int n = 1024;
    size_t size = n * sizeof(float);
    
    float *a = (float*)malloc(size);
    float *b = (float*)malloc(size);
    float *c = (float*)malloc(size);
    
    for (int i = 0; i &lt; n; i++) {
        a[i] = i;
        b[i] = i * 2;
    }
    
    float *d_a, *d_b, *d_c;
    check(cudaMalloc(&amp;d_a, size), "malloc a");
    check(cudaMalloc(&amp;d_b, size), "malloc b");
    check(cudaMalloc(&amp;d_c, size), "malloc c");
    
    check(cudaMemcpy(d_a, a, size, cudaMemcpyHostToDevice), "copy a");
    check(cudaMemcpy(d_b, b, size, cudaMemcpyHostToDevice), "copy b");
    
    int threads = 256;
    int blocks = (n + threads - 1) / threads;
    
    add&lt;&lt;&lt;blocks, threads&gt;&gt;&gt;(d_a, d_b, d_c, n);
    check(cudaGetLastError(), "kernel launch");
    check(cudaDeviceSynchronize(), "sync");
    
    check(cudaMemcpy(c, d_c, size, cudaMemcpyDeviceToHost), "copy result");
    
    for (int i = 0; i &lt; 10; i++) {
        printf("%.0f + %.0f = %.0f\n", a[i], b[i], c[i]);
    }
    
    cudaFree(d_a);
    cudaFree(d_b);
    cudaFree(d_c);
    free(a);
    free(b);
    free(c);
    
    return 0;
}</code></pre><p></p><h2>Running the kernel:</h2><p></p><p>Now that we have a basic CUDA file, we can compile and run it by running the following commands:</p><pre><code>nvcc -o vector_add vector_add.cu
./vector_add</code></pre><p>After running this program, you should see an output that matches the following:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bdMT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1788f746-70ba-4c8c-adda-5dd307cdd883_210x385.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bdMT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1788f746-70ba-4c8c-adda-5dd307cdd883_210x385.png 424w, https://substackcdn.com/image/fetch/$s_!bdMT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1788f746-70ba-4c8c-adda-5dd307cdd883_210x385.png 848w, https://substackcdn.com/image/fetch/$s_!bdMT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1788f746-70ba-4c8c-adda-5dd307cdd883_210x385.png 1272w, https://substackcdn.com/image/fetch/$s_!bdMT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1788f746-70ba-4c8c-adda-5dd307cdd883_210x385.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bdMT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1788f746-70ba-4c8c-adda-5dd307cdd883_210x385.png" width="210" height="385" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1788f746-70ba-4c8c-adda-5dd307cdd883_210x385.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:385,&quot;width&quot;:210,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:11445,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thehypervisor.blog/i/170409701?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1788f746-70ba-4c8c-adda-5dd307cdd883_210x385.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bdMT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1788f746-70ba-4c8c-adda-5dd307cdd883_210x385.png 424w, https://substackcdn.com/image/fetch/$s_!bdMT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1788f746-70ba-4c8c-adda-5dd307cdd883_210x385.png 848w, https://substackcdn.com/image/fetch/$s_!bdMT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1788f746-70ba-4c8c-adda-5dd307cdd883_210x385.png 1272w, https://substackcdn.com/image/fetch/$s_!bdMT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1788f746-70ba-4c8c-adda-5dd307cdd883_210x385.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p></p><p>And now you have run your first add vectors CUDA kernel! If you want to dive deeper you can check dive into this kernel and driver program in the next blog post!</p>]]></content:encoded></item></channel></rss>