Skip to main content

Read aloud

The highest-level use of the SDK: send a string, hear the AI speak it. The SDK opens an audio_playback session, streams the TTS audio over the channel WebSocket, plays it via new Audio(), and resolves a promise when the audio finishes.

This mirrors the ReadAloudDemo.tsx example in the SDK's React examples app.

Prerequisites

Minimum viable example

import {DelphiClient} from '@ki-kombinat/delphi-client-js-sdk';

const delphi = new DelphiClient({
apiDomain: 'api.delphi.ki-kombinat.com',
apiKey: 'sk_live_…',
});

await delphi.readAloud('Hello, world!', {
endpointId: '24599c70-1e79-4e52-9819-e2d97acf45a5',
});

That's it. The server picks the right read-aloud browser action; you only supply the text. Repeated calls reuse the same session and conversation thread.

Options

await delphi.readAloud('Some text', {
endpointId: '24599c70-1e79-4e52-9819-e2d97acf45a5',

metadata: {from: 'highlight'},

// Disambiguate when an endpoint has multiple read-aloud BOAs.
capabilityId: 'cap_123',
identifier: 'tts-fast',

// Override the BOA message-type if your runtime uses a custom one.
messageType: 'browser.action.readAloudFast',

// Cancel mid-flight.
signal: abortController.signal,

// Skip the SDK's built-in audio playback (handle audio yourself).
disableAutoPlay: true,
onAudio: (event) => myAudioPlayer.play(event.dataUrl),
});

Inspect the audio

The promise resolves with the assembled BrowserAudioEvent:

const audio = await delphi.readAloud('Welcome back!', {endpointId});

console.log(audio.dataUrl); // data: URL ready to feed into <audio>
console.log(audio.mimeType); // e.g. 'audio/wav'
console.log(audio.metadata); // BOA metadata from the server

Closing the session

// Close just audio playback for this endpoint.
await delphi.endSession(endpointId, 'audio_playback');

// Or close every mode for this endpoint at once.
await delphi.endSession(endpointId);

Power-user variant

If you want explicit control over the session lifecycle and the ability to push browser context the AI can reference:

const session = await delphi.openSession({
endpointId,
mode: 'audio_playback',
});

session.setBrowserContext({
text: document.querySelector('article')?.innerText ?? '',
source: 'page',
url: window.location.href,
});

session.sendBrowserAction({
messageType: 'browser.action.readAload',
});
await session.audioDone();

session.sendBrowserAction({
messageType: 'browser.action.transformAndRead',
text: 'Summarise the article.',
});
await session.audioDone();

await session.close();

Next step

Try the voice call quick start, or jump straight to the React guide for useDelphiSession.