How does streaming UI work with OpenAI's API in Next.js?

Next.js Edge Functions use the OpenAI SDK's stream option to receive tokens incrementally. The function returns a ReadableStream to the client, which renders each token as it arrives using React's experimental useChat or a custom streaming hook — eliminating the wait for the full response payload.

Why use Edge Functions instead of standard serverless for AI endpoints?

Edge Functions run on Vercel's edge network with no cold start penalty and support the Web Streams API natively. For AI streaming, this means the first token reaches the user within 100–200ms of the request, compared to 1–3 seconds of cold start latency on traditional Node.js serverless functions.

How does Postgres LISTEN/NOTIFY integrate with a Next.js frontend?

A lightweight WebSocket relay service subscribes to Postgres NOTIFY channels. When a database trigger fires (e.g., new prediction result inserted), the relay pushes the event to connected clients. The Next.js frontend maintains a single WebSocket connection and updates its React state on each message, producing real-time dashboard updates without polling.

Next.js AI SaaS Application: Streaming UI for Real-Time Predictive Analytics | Froz

A predictive analytics startup came to me with a critical UX failure: their AI-powered dashboard froze the browser for 8 seconds while waiting for OpenAI to return massive JSON payloads. I rebuilt the entire inference pipeline using Next.js Edge Functions with streaming responses and Postgres LISTEN/NOTIFY for real-time data propagation — reducing perceived latency to near-instant streamed output and cutting average session abandonment by 65%.

Why the Browser Was Freezing

The original implementation made a standard fetch call to a Node.js API route that called OpenAI's Chat Completions API with stream: false. The entire response — often 4,000+ tokens of structured analytical output — had to complete server-side before a single byte was sent to the client.

During that 6–8 second wait:

The UI showed a spinner with zero progress indication. Users had no idea whether the system was working or broken.
The main thread blocked on JSON parsing. The full payload arrived as a single massive JSON object, and JSON.parse() on a 50KB string caused a visible jank spike.
Session abandonment spiked. Analytics showed 40% of users navigated away before results appeared.

The startup had investigated WebSocket solutions and considered switching to a Python backend. Both approaches would have introduced significant infrastructure complexity. The actual fix required zero additional infrastructure.

The Architecture: Edge Functions + Streaming

Next.js Edge Functions support the Web Streams API natively. Instead of buffering the entire AI response server-side, the Edge Function opens a streaming connection to OpenAI and pipes each token directly to the client as a ReadableStream.

// app/api/analyze/route.ts — Edge Function with streaming
import { OpenAI } from 'openai';

export const runtime = 'edge';

interface AnalysisRequest {
  prompt: string;
  datasetId: string;
}

export async function POST(request: Request): Promise<Response> {
  const { prompt, datasetId }: AnalysisRequest = await request.json();

  const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! });

  const stream = await openai.chat.completions.create({
    model: 'gpt-4o',
    stream: true,
    messages: [
      {
        role: 'system',
        content: `You are a data analyst. Analyze dataset ${datasetId}. 
                  Return structured insights as markdown.`,
      },
      { role: 'user', content: prompt },
    ],
  });

  const encoder = new TextEncoder();

  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const text = chunk.choices[0]?.delta?.content ?? '';
        if (text) {
          controller.enqueue(encoder.encode(`data: ${text}\n\n`));
        }
      }
      controller.enqueue(encoder.encode('data: [DONE]\n\n'));
      controller.close();
    },
  });

  return new Response(readable, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      Connection: 'keep-alive',
    },
  });
}

The first token reaches the browser within 150ms of the request. The user sees the analysis materialize word-by-word — an experience that feels instantaneous, regardless of total generation time.

Client-Side Streaming Hook

The frontend consumes the stream with a custom React hook that appends tokens to a state buffer. No third-party streaming library is required:

Screenshot of the AI analytics dashboard rendering a streaming response in real-time with a typing cursor animation

The component renders a markdown preview that updates on every token. A subtle cursor animation at the end of the output provides a visual cue that generation is still in progress. When the [DONE] signal arrives, the cursor disappears and the final output is committed to the database for historical retrieval.

Real-Time Data Propagation with LISTEN/NOTIFY

The dashboard doesn't just display AI-generated insights. It also shows real-time metrics from the client's data pipeline: new records ingested, prediction confidence scores, and anomaly alerts.

Instead of polling an API endpoint every 5 seconds (the original approach), the rebuilt system uses PostgreSQL's native LISTEN/NOTIFY mechanism:

A database trigger fires on every INSERT into the prediction_results table.
The trigger calls pg_notify('new_prediction', payload).
A lightweight Node.js relay service (running as a single Vercel serverless function on a WebSocket upgrade) subscribes to the channel and pushes events to connected dashboard clients.

The client's browser receives new data within 50ms of the database insert. No polling. No wasted API calls. No stale data.

Visualization Performance

The dashboard renders heavy chart components — time series plots with 10,000+ data points, correlation heatmaps, and distribution histograms. These components are loaded as dynamic imports with next/dynamic and ssr: false to prevent server-side rendering of canvas-dependent libraries.

The chart library itself uses <canvas> rendering instead of SVG, avoiding the DOM node explosion that SVG-based charting libraries cause with large datasets. With 15,000 data points rendered on a time series chart, frame rate stays above 55fps on a 2020 MacBook Air.

Performance Comparison

| Metric | Original Node.js API | Next.js Edge Streaming Build | |---|---|---| | Time to First Token | 6,200ms | ~150ms | | Perceived Response Time | 8,000ms (full wait) | Instant (streaming) | | Session Abandonment | 40% | 14% (−65%) | | Dashboard Data Freshness | 5s polling interval | < 50ms (LISTEN/NOTIFY) | | Chart Rendering (15K points) | 800ms (SVG) | 120ms (Canvas) |

Performance comparison chart showing the before/after latency reduction from buffered to streamed AI responses

Deliverables

The client received the full Next.js codebase, the Supabase database with all triggers and functions, and the Vercel deployment configuration. The OpenAI API key is the client's own — I have zero access to their inference costs or data.

The system is model-agnostic. Swapping GPT-4o for Claude, Gemini, or a self-hosted LLM requires changing a single API client module. The streaming infrastructure remains identical.

Skip the technical debt. Let a senior engineer build your core architecture from scratch. View my custom Next.js engineering tiers on Fiverr.