WebSocket API

WebSocket endpoint for real-time face recognition and person identification.

Overview

The WebSocket API provides a bidirectional communication channel for real-time face detection and recognition. This is the core functionality of the DDFR application, enabling continuous video frame processing and instant person identification.

Architecture

The WebSocket implementation uses an asynchronous processing model:

Client Connection: Client establishes WebSocket connection to /ws
Frame Transmission: Client sends binary image data (JPEG/PNG encoded frames)
Asynchronous Processing: Server processes frames in a thread pool executor to avoid blocking
Response Transmission: Server returns JSON results with detected faces and identities
Rate Limiting: Handled client-side (recommended: 50ms intervals = 20 FPS)

Performance Considerations

Thread Pool Configuration

Executor: Single-threaded (max_workers=1) to ensure sequential frame processing
Rationale: Prevents race conditions and ensures consistent state during face recognition
Note: NumPy threading is disabled via environment variables to prevent conflicts

Processing Pipeline

Image Decoding: Binary bytes → OpenCV BGR frame
Face Detection: InsightFace model detects faces in frame
Embedding Extraction: Face features extracted as embedding vectors
Person Identification: FAISS/Numpy similarity search against database
Response Formatting: Results serialized as JSON

Batch Processing

When multiple faces are detected in a single frame: - All faces are processed in parallel during embedding extraction - Batch identification is performed using vectorized operations - Results maintain face-to-identity mapping

Connection

Endpoint

ws://localhost:8000/ws

For HTTPS:

wss://your-domain.com/ws

Connection Lifecycle

Connect: Client initiates WebSocket handshake
Accept: Server accepts connection (websocket.accept())
Loop: Continuous frame processing until disconnect
Disconnect: Graceful handling with WebSocketDisconnect exception

Message Protocol

Client → Server (Binary)

Format: Raw binary image data

Supported Formats: - JPEG encoded images - PNG encoded images - Any format supported by OpenCV imdecode()

Example (JavaScript):

const canvas = document.getElementById('video-canvas');
canvas.toBlob((blob) => {
  blob.arrayBuffer().then((buffer) => {
    websocket.send(buffer);
  });
}, 'image/jpeg', 0.8);

Example (Python):

import cv2
import websockets

frame = cv2.imread('image.jpg')
_, buffer = cv2.imencode('.jpg', frame)
await websocket.send(buffer.tobytes())

Server → Client (JSON)

Format: JSON object with status and faces array

Response Structure:

{
  "status": "ok",
  "faces": [
    {
      "id": "123_456",
      "top": 100,
      "right": 300,
      "bottom": 400,
      "left": 200,
      "name": "John",
      "surname": "Doe",
      "age": 30,
      "relationship": "amico",
      "role": "guest"
    }
  ]
}

Unknown Person Structure:

{
  "status": "ok",
  "faces": [
    {
      "id": "123_456",
      "top": 100,
      "right": 300,
      "bottom": 400,
      "left": 200,
      "name": "Unknown",
      "surname": null,
      "age": 0,
      "relationship": null,
      "role": null
    }
  ]
}

Empty Response (No faces detected):

{
  "status": "ok",
  "faces": []
}

Error Response (Decoding failure): Returns null (connection remains open, client should skip frame)

Field Descriptions

Face Object

Field	Type	Description
`id`	string	Unique identifier for the face in format `"{top}_{left}"` (pixel coordinates)
`top`	integer	Top Y coordinate of bounding box
`right`	integer	Right X coordinate of bounding box
`bottom`	integer	Bottom Y coordinate of bounding box
`left`	integer	Left X coordinate of bounding box
`name`	string	Person's first name (or "Unknown" if not identified)
`surname`	string	Person's last name (or null if not identified)
`age`	integer	Calculated age based on birthday (or 0 if not identified)
`relationship`	string	Relationship type enum value (or null if not identified)
`role`	string	Person role enum value ("user" or "guest", or null if not identified)

Face ID Format

The id field uses pixel coordinates: "{top}_{left}" (e.g., "100_200"). This provides: - Uniqueness within a frame - Traceability for debugging - Client-side face tracking capabilities

Identification Process

Threshold Configuration

The identification uses a similarity threshold of 0.4 (hardcoded in process_image_sync). This is more lenient than the default APP_TOLLERANCE setting to improve recognition accuracy.

Similarity Score Range: - 0.0 - 1.0 (1.0 = identical match) - Threshold 0.4 means matches above 40% similarity are accepted

Database States

Database Active (feature_matrix is not None):
Full identification pipeline active
FAISS or NumPy similarity search performed
Matched persons returned with data
Database Empty (feature_matrix is None):
Fallback mode: detection only
Faces detected but no identification attempted
All faces marked as "Unknown"
Error logged for monitoring

Batch Processing

When multiple faces are detected: - All embeddings extracted simultaneously - Single batch identification call for efficiency - Results maintain order: identities[i] corresponds to faces[i]

Error Handling

Image Decoding Errors

Cause: Invalid image format or corrupted data
Behavior: Function returns None, error logged
Client Action: Skip frame, continue with next

Database Errors

Cause: feature_matrix unavailable or empty
Behavior: Fallback to detection-only mode, error logged
Client Action: Receive faces with "Unknown" identity

WebSocket Disconnection

Cause: Client closes connection or network interruption
Behavior: WebSocketDisconnect exception caught, connection cleaned up
Logging: Info-level message recorded

Processing Errors

Cause: Unexpected exceptions during processing
Behavior: Error logged, connection remains open
Client Action: May receive null response, should handle gracefully

Rate Limiting

Client-Side Responsibility: Rate limiting is handled on the frontend to maintain responsive UI.

Recommended Rate: 50ms intervals (20 FPS)

Example Implementation:

let lastSend = 0;
const interval = 50; // ms

function sendFrame() {
  const now = Date.now();
  if (now - lastSend >= interval) {
    websocket.send(frameData);
    lastSend = now;
  }
}

Example Usage

JavaScript/WebSocket API

const ws = new WebSocket('ws://localhost:8000/ws');

ws.onopen = () => {
  console.log('WebSocket connected');

  // Start sending frames from video
  setInterval(() => {
    const canvas = document.getElementById('canvas');
    canvas.toBlob((blob) => {
      blob.arrayBuffer().then((buffer) => {
        ws.send(buffer);
      });
    }, 'image/jpeg', 0.8);
  }, 50); // 20 FPS
};

ws.onmessage = (event) => {
  const result = JSON.parse(event.data);
  console.log('Detected faces:', result.faces);

  result.faces.forEach(face => {
    if (face.name !== 'Unknown') {
      console.log(`Identified: ${face.name} ${face.surname}`);
    }
  });
};

ws.onerror = (error) => {
  console.error('WebSocket error:', error);
};

ws.onclose = () => {
  console.log('WebSocket disconnected');
};

Python Example

import asyncio
import websockets
import cv2

async def send_frames():
    uri = "ws://localhost:8000/ws"
    async with websockets.connect(uri) as websocket:
        cap = cv2.VideoCapture(0)

        try:
            while True:
                ret, frame = cap.read()
                if not ret:
                    break

                # Encode frame as JPEG
                _, buffer = cv2.imencode('.jpg', frame, [cv2.IMWRITE_JPEG_QUALITY, 80])

                # Send binary data
                await websocket.send(buffer.tobytes())

                # Receive response
                response = await websocket.recv()
                result = json.loads(response)

                print(f"Faces detected: {len(result['faces'])}")

                # Rate limiting: ~20 FPS
                await asyncio.sleep(0.05)

        finally:
            cap.release()

asyncio.run(send_frames())

Implementation Details

Thread Pool Executor

The synchronous process_image_sync function is executed in a thread pool to avoid blocking the event loop:

result = await loop.run_in_executor(executor, process_image_sync, data)

Benefits: - Non-blocking I/O for WebSocket operations - CPU-intensive processing in separate thread - Maintains async architecture

NumPy Threading Disabled

Environment variables are set to prevent NumPy threading conflicts:

os.environ["OMP_NUM_THREADS"] = "1"
os.environ["OPENBLAS_NUM_THREADS"] = "1"
# ... etc

Reason: Prevents GIL contention and ensures consistent behavior.

API Reference

app.routers.websocket.router `module-attribute`

router = APIRouter()

app.routers.websocket.websocket_endpoint `async`

websocket_endpoint(websocket)

WebSocket endpoint for real-time face recognition.

Accepts binary image data over WebSocket connection, processes frames asynchronously for face detection and recognition, and returns results in JSON format. Rate limiting is handled on the frontend (50ms = 20 FPS).

Parameters:

Name	Type	Description	Default
`websocket`	`WebSocket`	FastAPI WebSocket connection instance.	required

Raises:

Type	Description
`WebSocketDisconnect`	When client disconnects from the WebSocket.
`Exception`	Logs any errors during image processing or WebSocket communication.

Source code in backend/app/routers/websocket.py

@router.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    """WebSocket endpoint for real-time face recognition.

    Accepts binary image data over WebSocket connection, processes frames
    asynchronously for face detection and recognition, and returns results in JSON format.
    Rate limiting is handled on the frontend (50ms = 20 FPS).

    Args:
        websocket (WebSocket): FastAPI WebSocket connection instance.

    Raises:
        WebSocketDisconnect: When client disconnects from the WebSocket.
        Exception: Logs any errors during image processing or WebSocket communication.

    """
    await websocket.accept()
    loop = asyncio.get_event_loop()

    try:
        while True:
            data = await websocket.receive_bytes()

            # Il rate limiting è gestito lato frontend (50ms = 20 FPS)
            result = await loop.run_in_executor(executor, process_image_sync, data)

            if result:
                await websocket.send_json(result)

    except WebSocketDisconnect:
        logger.info("Client disconnesso")
    except Exception as e:
        logger.error(f"Errore WebSocket: {e}")

app.routers.websocket.process_image_sync

process_image_sync(image_bytes)

Process image bytes synchronously for face detection and recognition.

Decodes image bytes, detects faces, and identifies persons using the face engine. Returns face detection results with bounding boxes and person information.

Parameters:

Name	Type	Description	Default
`image_bytes`	`bytes`	Raw image bytes to process.	required

Returns:

Type	Description
`dict \| None`	dict \| None: Dictionary containing status and list of detected faces. Format: {"status": "ok", "faces": [{"id": str, "top": int, "right": int, "bottom": int, "left": int, "name": str, "surname": str, "age": int, "relationship": str, "role": str}, ...]} Returns None if image decoding fails.

Source code in backend/app/routers/websocket.py

def process_image_sync(image_bytes: bytes) -> dict | None:
    """Process image bytes synchronously for face detection and recognition.

    Decodes image bytes, detects faces, and identifies persons using the face engine.
    Returns face detection results with bounding boxes and person information.

    Args:
        image_bytes (bytes): Raw image bytes to process.

    Returns:
        dict | None: Dictionary containing status and list of detected faces.
            Format: {"status": "ok", "faces": [{"id": str, "top": int, "right": int, 
            "bottom": int, "left": int, "name": str, "surname": str, "age": int,
            "relationship": str, "role": str}, ...]}
            Returns None if image decoding fails.

    """
    try:
        np_arr = np.frombuffer(image_bytes, np.uint8)
        frame = cv2.imdecode(np_arr, cv2.IMREAD_COLOR)

        if frame is None:
            return None
    except Exception as e:
        logger.error(f"Errore parsing immagine: {e}")
        return None

    faces: List[Face] = engine.analyze_frame(frame)
    found_people_list: List[Tuple[Optional[Person], Face]] = []

    # Nessun volto rilevato (uscita rapida)
    if not faces:
        return {"status": "ok", "faces": []}

    # Abbiamo volti E il Database è attivo -> BATCH PROCESSING
    if engine.feature_matrix is not None:
        embeddings = [face.embedding for face in faces]
        identities = engine.identify(embeddings, threshold=0.4)

        for (found_person, score), face in zip(identities, faces):
            found_people_list.append((found_person, face))

    # Abbiamo volti MA il Database non c'è (Fallback)
    else:
        logger.error(f"feature_matrix None ma rilevati {len(faces)} volti")
        for face in faces:
            found_people_list.append((None, face))

    faces_data = []

    for person_data, face in found_people_list:
        bbox = face.bbox.astype(int)
        left, top, right, bottom = int(bbox[0]), int(bbox[1]), int(bbox[2]), int(bbox[3])

        face_dict = {
            "id": f"{top}_{left}",
            "top": top,
            "right": right,
            "bottom": bottom,
            "left": left
        }

        if person_data is not None:
            relationship = getattr(person_data.relationship, "value", person_data.relationship)
            role = getattr(person_data.role, "value", person_data.role)

            face_dict.update({
                "name": person_data.name,
                "surname": person_data.surname,
                "age": person_data.age, 
                "relationship": relationship,
                "role": role,
            })
        else:
            face_dict.update({
                "name": "Unknown",
                "surname": None,
                "age": 0,
                "relationship": None,
                "role": None,
            })

        faces_data.append(face_dict)

    return {"status": "ok", "faces": faces_data}

WebSocket API

Overview

Architecture

Performance Considerations

Thread Pool Configuration

Processing Pipeline

Batch Processing

Connection

Endpoint

Connection Lifecycle

Message Protocol

Client → Server (Binary)

Server → Client (JSON)

Field Descriptions

Face Object

Face ID Format

Identification Process

Threshold Configuration

Database States

Batch Processing

Error Handling

Image Decoding Errors

Database Errors

WebSocket Disconnection

Processing Errors

Rate Limiting

Example Usage

JavaScript/WebSocket API

Python Example

Implementation Details

Thread Pool Executor

NumPy Threading Disabled

API Reference

app.routers.websocket.router module-attribute

app.routers.websocket.websocket_endpoint async

app.routers.websocket.process_image_sync

app.routers.websocket.router `module-attribute`

app.routers.websocket.websocket_endpoint `async`