Streaming Responses

To create a smooth user experience, Caddie supports streaming of text and other chat UI data to the client, allowing users to begin to see their chat response before it has fully completed. Alternatively, if you want to wait until the chat has completed, you can use a "blocking" approach to chat requests. While not as positive a user experience, the "blocking" chat responses can be useful for testing purposes.

Line-by-line streaming

To stream AI chat responses line-by-line, make a request to the /api/chat endpoint.

const response = await fetch('http://localhost:3000/api/chat', {
  method: "POST",
  headers: {
    'X-API-KEY': ""
  },
  body: JSON.stringify({
    agentId: "",
    message: {
      content: ""
    },
    impersonatedUser: {
		id: "",
		email: ""
	},
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder('utf-8');
while (true) {
  const { done, value } = await reader.read();
  if (done) { break; }
  const chunk = decoder.decode(value, { stream: true });
  process.stdout.write(chunk);
}

If you are using React as your frontend framework, you can use the Vercel AI SDK for chat streaming: https://v4.ai-sdk.dev/ If you are using another framework or do not wish to use the Vercel AI SDK UI elements, you can use any dynamic frontend rendering that supports streaming data using Server-Sent Events.

"Blocking" chat response

A "blocking" chat response waits until the response has been completed, then shows it all at once.

const response = await fetch('http://localhost:3000/api/chat', {
 method: "POST",
 headers: {
   'X-API-KEY': ""
 },
 body: JSON.stringify({
   message: {
     content: ""
   },
	impersonatedUser: {
		id: "",
		email: ""
	},
 })
});

const chatResponse = await response.text();
console.log(chatResponse);

Streaming protocol

Caddie uses version 4 of the Vercel AI SDK Data Stream protocol for streaming data. To replicate Caddie's UI features while embedding the Agent in your own chat UI, use the Vercel UI docs for more info on how to interpret the components of your chat response.

An example of what this text stream looks like for the prompt "Tell me about the history of Seattle, Washington" is shown below:

8:[{"chatId":"9032de33-909a-4c5a-a8ba-f3729673a6ef"}]
f:{"messageId":"0cc34675-e14f-4ecf-b96d-260e703f4bc1"}
9:{"toolCallId":"tooluse_nbBkxlpiRYOimkSc5mxBtA","toolName":"searchKnowledgebase","args":{"question":"history of Seattle Washington"}}
a:{"toolCallId":"tooluse_nbBkxlpiRYOimkSc5mxBtA","result":[]}
e:{"finishReason":"tool-calls","usage":{"promptTokens":1277,"completionTokens":59},"isContinued":false}
f:{"messageId":"0c4ff480-fbcc-4915-98ce-912cceb70d5f"}
0:"I'd be "
0:"happy "
0:"to "
0:"tell you "
0:"about "
0:"the "
0:"history of "
0:"Seattle, "
0:"Washington! "
0:"Here's "
0:"an "
0:"overview "
0:"of "
0:"the "
0:"city's "
0:"fascinating "
0:"development:\n\n" ... (chat continues)
e:{"finishReason":"stop","usage":{"promptTokens":1350,"completionTokens":527},"isContinued":false}
d:{"finishReason":"stop","usage":{"promptTokens":2627,"completionTokens":586}}

The first line of the response contains the id of the chat session, which will only be present on the creation of a new chat session. The next few lines are used to encode the tool call to search the knowledge base, then to provide the id of the AI's message response. Text streaming lines (which is probably what you want to show to the client) are prefixed with "0:". Finally, the chat concludes with a "finishReason", which provides some information about why the response ended. The value "stop" indicates a natural stopping point in the agent's response, but other stopping reasons could be that the Agent reached the maximum number of tokens allowed in its responses, for example.