Message adapter pipeline

When a "message" type UnifiedMessage arrives at the Communication Manager, it may carry content that the agent cannot directly understand — audio recordings, images, video. The adapter pipeline solves this by enriching each ContentItem in-place before the message is placed on the inbound queue.

Design principles

Enrich in-place, never add or remove items — Adapters write their output into item.metadata["description"] on the existing ContentItem. The original body (audio data, image URL, etc.) is always preserved. No new content items are added, no originals are removed. Concurrent capabilities — Independent adapters (audio and image) run concurrently within a single message via asyncio.gather. A message containing both audio and image content is enriched in parallel. Non-blocking — The entire pipeline runs inside an asyncio.Task spawned per message. receive() on the Communication Manager returns immediately after sending an acknowledgment event, so incoming messages from other devices are never blocked while a transcription or image analysis is in progress. Fail gracefully — If an adapter fails (network error, API quota, etc.) the error is captured and written to item.metadata["adapter_error"]. The message is still queued to the agent, which can then decide how to respond.

Content enrichment example

A voice message arrives with one audio content item:

{
  "content_type": "audio",
  "body": "https://cdn.example.com/voice.ogg",
  "metadata": { "duration_ms": 3400 }
}

After the audio adapter:

{
  "content_type": "audio",
  "body": "https://cdn.example.com/voice.ogg",
  "metadata": {
    "duration_ms": 3400,
    "description": "Hey, can you check the server logs?"
  }
}

On transcription failure:

{
  "content_type": "audio",
  "body": "https://cdn.example.com/voice.ogg",
  "metadata": {
    "duration_ms": 3400,
    "adapter_error": "Transcription service unavailable"
  }
}

Pipeline flow

Adapter classes

The adapter system uses a three-level class hierarchy.

`MessageAdapter` (ABC)

The minimal interface every adapter must implement:

class MessageAdapter(ABC):
    def can_handle(self, msg: UnifiedMessage) -> bool: ...
    async def adapt(self, msg: UnifiedMessage) -> UnifiedMessage: ...

`ContentTypeAdapter` (base class)

A Template Method base for the common case: target a specific content_type, loop over matching items, call process_item(), write the result into item.metadata["description"]. Handles matching, iteration, and error capture. Subclasses only implement two things:

class ContentTypeAdapter(MessageAdapter):
    @property
    @abstractmethod
    def target_content_type(self) -> str: ...

    @abstractmethod
    async def process_item(self, item: ContentItem) -> str: ...

Concrete adapters

Each concrete adapter extends ContentTypeAdapter with one property and one method:

class AudioTranscriptionAdapter(ContentTypeAdapter):
    target_content_type = "audio"

    async def process_item(self, item: ContentItem) -> str:
        # call LangChain Whisper, return transcript

class ImageUnderstandingAdapter(ContentTypeAdapter):
    target_content_type = "image"

    async def process_item(self, item: ContentItem) -> str:
        # call LangChain vision model, return description

Adapters

Audio transcription

Class AudioTranscriptionAdapter — hiroserver/hirocli/src/hirocli/runtime/adapters/audio_adapter.py Transcribes audio content items using LangChain’s OpenAI Whisper integration (langchain_community.document_loaders.parsers.audio.OpenAIWhisperParser).

Trigger: any ContentItem with content_type == "audio"
Reads: item.body — a URL, data URI, or raw base64 audio payload
Writes: transcript text into item.metadata["description"]
Side effect: optionally sends a message.transcribed event back to the device once transcription is complete
Disabled when: OPENAI_API_KEY environment variable is not set (can_handle returns False)

Image understanding

Class ImageUnderstandingAdapter — hiroserver/hirocli/src/hirocli/runtime/adapters/image_adapter.py Describes image content items using a LangChain multimodal vision model.

Trigger: any ContentItem with content_type == "image"
Reads: item.body — a URL, data URI, or raw base64 image payload
Writes: description text into item.metadata["description"]
Default model: openai:gpt-4o-mini (override with IMAGE_VISION_MODEL env var)
Default prompt: a generic description prompt (override with IMAGE_ANALYSIS_PROMPT env var)
Disabled when: OPENAI_API_KEY environment variable is not set (can_handle returns False)

How the Agent Manager uses enriched messages

The Agent Manager builds its agent input by reading from all content items — not just text:

parts = []
for item in msg.content:
    if item.content_type == "text":
        parts.append(item.body)
    elif "description" in item.metadata:
        parts.append(f"[{item.content_type}]: {item.metadata['description']}")

text_body = "\n".join(parts)

A voice message with a transcript arrives at the agent as:

[audio]: Hey, can you check the server logs?

A message with a text caption and an image arrives as:

Can you tell me what's in this photo?
[image]: A kitchen counter with a coffee maker, a stack of books, and a plant near the window.

Adding a future adapter

Adding a new adapter (for example, video or PDF) requires only extending ContentTypeAdapter:

class VideoAdapter(ContentTypeAdapter):
    target_content_type = "video"

    async def process_item(self, item: ContentItem) -> str:
        # call video understanding service
        return description

adapter_pipeline = MessageAdapterPipeline([
    AudioTranscriptionAdapter(),
    ImageUnderstandingAdapter(),
    VideoAdapter(),  # add here
])

No other changes are needed.

Communication Manager

How the adapter pipeline fits into the broader message routing flow.

Agent Manager

How the agent consumes enriched messages from the inbound queue.

UnifiedMessage

The ContentItem structure that adapters enrich.

Concepts

Components

Plugins

Message adapter pipeline

Design principles

Content enrichment example

Pipeline flow

Adapter classes

`MessageAdapter` (ABC)

`ContentTypeAdapter` (base class)

Concrete adapters

Adapters

Audio transcription

Image understanding

How the Agent Manager uses enriched messages

Adding a future adapter

See also

Communication Manager

Agent Manager

UnifiedMessage

Concepts

Components

Plugins

​Design principles

​Content enrichment example

​Pipeline flow

​Adapter classes

​MessageAdapter (ABC)

​ContentTypeAdapter (base class)

​Concrete adapters

​Adapters

​Audio transcription

​Image understanding

​How the Agent Manager uses enriched messages

​Adding a future adapter

​See also

Communication Manager

Agent Manager

UnifiedMessage

Design principles

Content enrichment example

Pipeline flow

Adapter classes

`MessageAdapter` (ABC)

`ContentTypeAdapter` (base class)

Concrete adapters

Adapters

Audio transcription

Image understanding

How the Agent Manager uses enriched messages

Adding a future adapter

See also