<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>GenAI on Route179</title>
    <link>https://route179.dev/tags/genai/</link>
    <description>Recent content in GenAI on Route179</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <copyright>2026 Sheng Chen</copyright>
    <lastBuildDate>Tue, 30 Jun 2026 00:00:00 +1000</lastBuildDate>
    <atom:link href="https://route179.dev/tags/genai/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Optimizing vLLM Cold Start with Model Streaming and Compile Caching</title>
      <link>https://route179.dev/2026/06/30/optimizing-vllm-cold-start-with-model-streaming-and-compile-caching/</link>
      <pubDate>Tue, 30 Jun 2026 00:00:00 +1000</pubDate>
      <guid>https://route179.dev/2026/06/30/optimizing-vllm-cold-start-with-model-streaming-and-compile-caching/</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve recently been working on an AI-at-the-edge project (&lt;a href=&#34;https://github.com/sc13912/safetylens_v2/&#34;&gt;SafetyLens&lt;/a&gt;) — that&amp;rsquo;s a topic for another post. During the demo development, I frequently needed to tweak the model serving engine, swap models in and out, and refine configurations. Every one of those changes meant restarting vLLM, and each restart triggered a full cold start that took five minutes or more before the server was ready to serve again.&lt;/p&gt;
&lt;p&gt;And that&amp;rsquo;s with just a single &lt;code&gt;Qwen3.6-35B-A3B&lt;/code&gt; model running off local NVMe storage on 1x NVIDIA DGX Spark node. This will only get worse once I scale the deployment to a multi-node cluster backed by an external PVC, where the weights have to travel over the network on every start. So I went looking for ways to cut down the vLLM cold-start time — and it turns out most of it is recoverable, with no changes to the model or its output quality.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Deploy DeepSeek-R1-0528-671B on Amazon EKS using vLLM</title>
      <link>https://route179.dev/publications/repost-eks-deepseek-vllm/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>https://route179.dev/publications/repost-eks-deepseek-vllm/</guid>
      <description></description>
    </item>
    <item>
      <title>Deploy production generative AI at the edge using Amazon EKS Hybrid Nodes with NVIDIA DGX</title>
      <link>https://route179.dev/publications/blog-eks-hybrid-genai-dgx/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>https://route179.dev/publications/blog-eks-hybrid-genai-dgx/</guid>
      <description></description>
    </item>
  </channel>
</rss>
