Tutorials

How to Build Automated GDPR-Compliant Video Redaction with TwelveLabs

Hrishikesh Yadav

This tutorial walks through building a production-ready GDPR video redaction application on TwelveLabs — combining Marengo 3.0 for entity-based retrieval, Pegasus 1.5 for structured timestamped privacy metadata, and a local face detection pipeline for stable identity-locked blur tracks. The result takes a reviewer from upload to audit-ready redacted output in a single interface, with documented rationale at every decision point.

This tutorial walks through building a production-ready GDPR video redaction application on TwelveLabs — combining Marengo 3.0 for entity-based retrieval, Pegasus 1.5 for structured timestamped privacy metadata, and a local face detection pipeline for stable identity-locked blur tracks. The result takes a reviewer from upload to audit-ready redacted output in a single interface, with documented rationale at every decision point.

In this article

No headings found on page

Join our newsletter

Join our newsletter

Receive the latest advancements, tutorials, and industry insights in video understanding

Receive the latest advancements, tutorials, and industry insights in video understanding

Search, analyze, and explore your videos with AI.

May 28, 2026

13 Minutes

Copy link to article

Introduction

Privacy review at scale is not an editing problem. It is an intelligence problem.

When a legal request lands and your team needs to locate, assess, and redact every appearance of a specific person across hours of footage, frame-by-frame review does not work. The GDPR Enforcement Tracker recorded more than €6.06 billion in cumulative fines across 2,793 cases by March 2026. Article 83 still permits penalties up to €20 million or 4% of global annual turnover for the most serious violations. The operational pressure is real, and manual workflows are not built to absorb it.

GDPR-compliant redaction requires three things working together: accurate identification of the right person or object, retrieval of every relevant appearance across a corpus, and redaction limited to what the legal purpose actually requires. Blurring everything is not a defensible strategy. The regulation's data minimization principle pushes toward precision, not blanket suppression.

This tutorial walks through a production-ready GDPR video redaction application built on TwelveLabs. The application uses Marengo 3.0 for multimodal search and entity-based retrieval, Pegasus 1.5 for structured privacy metadata, and a local face detection pipeline for frame-level identity clustering. The result is a workflow that takes a reviewer from video upload to export-ready redacted output in a single interface.

You can explore the live deployment at tl-gdpr-compliance.vercel.app and find the source code at github.com/Hrishikesh332/tl-GDPR-compliance-redaction.


What the Application Does

Most redaction tools are built around manual selection: a reviewer watches footage, draws bounding boxes, and exports a blurred clip. That approach does not scale when the corpus is large, the person appears across multiple clips, or the legal obligation requires documented reasoning for each redaction decision.

This application approaches the problem differently. TwelveLabs serves as the video understanding layer throughout the workflow. Reviewers search across indexed footage using text, images, or registered identity entities. Pegasus 1.5 generates timestamped privacy metadata that surfaces high-risk moments before a reviewer watches a single frame. Local face detection clusters detections into stable per-person identities that persist through the export pipeline. Blur tracks follow a specific person across motion, profile turns, and camera cuts.

The workflow covers the full review cycle: upload and index footage, search for subjects using natural language or face images, review AI-generated privacy risk segments on an interactive timeline, select redaction targets by person identity, and export a stable, face-locked redacted video.


Inside the Redaction Pipeline

The application connects four distinct capabilities into a single review workflow.

Multimodal search with Marengo 3.0 lets reviewers locate any person, object, or scene using text queries, image uploads, or registered face entities. Marengo returns relevant clip segments with confidence scores, which the interface renders as a visual timeline lane with review markers at the strongest matches.

Privacy metadata with Pegasus 1.5 generates timestamped segment data focused on privacy risk. Rather than describing the full video, Pegasus is given a schema that targets specific categories: faces, documents, license plates, screens, sensitive objects, and protected individuals. Each segment includes a risk level, redaction reason, recommended action, and scene role.

Local detection and identity clustering uses Pegasus timestamps to guide keyframe extraction. The application then runs local face detection and InsightFace embeddings, clustering detected faces into stable per-person identities with consistent IDs across the full video.

Face-lock blur and export redaction converts the selected person's detection history into a frame-indexed bounding box lane. During export, the renderer applies blur from that lane on each matching frame, producing a stable, identity-aware redacted video.


Setting Up the Environment

Before building, complete the following prerequisites.

  1. Create a TwelveLabs account and generate an API key. Create an index with Marengo 3.0 and Pegasus 1.5 enabled, and record the Index ID.

  2. Create a TwelveLabs Entity Collection to support face registration and entity search.

  3. Install Python 3 for the Flask backend and Node.js/npm for the React frontend. FFmpeg is recommended for video re-encoding on export.

  4. Clone the repository and follow the setup instructions in the README.

Create backend/.env using the variables defined in .env.example.


Section 1: Entity Management with TwelveLabs

Locating a specific known person across indexed footage requires more than a text search. Registering a face as a TwelveLabs entity creates a reusable identity reference that can be combined with context queries: find this person near a document, in a specific scene type, or performing a specific action.

This matters for redaction precision. Instead of flagging every face in the video, the reviewer can anchor the search to a known identity and retrieve only the moments where that person is visible.


1.1 - Registering Face Assets in the Entity Index

When the user uploads a face image and provides a name, the backend detects the face using a ResNet-10 Face Detector, generates a preview crop, and prepares a face asset for registration with TwelveLabs.

The flow has two steps: upload the face image as an asset, then create a named entity referencing the returned asset ID. This links the visual reference to a searchable identity.

asset_id = twelvelabs_service.upload_face_asset(tmp.name)
metadata = {"name": name}
if preview_base64:
    metadata["face_snap_base64"] = preview_base64

entity_result = twelvelabs_service.create_entity(
    name=name,
    asset_ids=[asset_id],
    description=description or f"Face entity: {name}",
    metadata=metadata,

After registration, the entity is available for identity-constrained search across any indexed video in the collection.


1.2 - Searching with Entity and Text

Search supports three modes: entity-based, text-based, and image-based. Reviewers can combine them depending on what they know at the start of the review.

For entity-based search, the backend wraps the entity ID in TwelveLabs' mention format (<@entity_id>). When the user adds a text qualifier, both are combined so the search is constrained by both identity and context.

backend/services/twelvelabs_services (Line 1214)

def entity_search(entity_id, query_suffix="", index_id=None):
    client = get_client()
    idx = resolve_index_id(index_id)

    query_text = f"<@{entity_id}>"
    if query_suffix:
        query_text = f"<@{entity_id}> {query_suffix}"

    logger.info("Entity search: %s", query_text)
    response = client.search.query(
        index_id=idx,
        search_options=["visual"],
        query_text=query_text,
        group_by="video",
        sort_option="score",
        page_limit=50,
    )
    return serialize_search_results(response)

For image-based search, the backend passes the image as query_media_type="image" with either query_media_file or query_media_url.

if image_url:
    response = client.search.query(
        **kwargs,
        query_media_type="image",
        query_media_url=image_url,
    )

if image_path:
    with open(image_path, "rb") as image_file:
        response = client.search.query(
            **kwargs,
            query_media_type="image",
            query_media_file=image_file,
        )


1.3 - Search Timeline Lane and Review Markers

Search results are rendered not just as a ranked list but as a visual timeline lane. Each result segment has a start and end time, which the frontend maps onto the video scrubber. High-confidence matches are marked with red review indicators so reviewers can distinguish the strongest hits from surrounding context.

This turns search into a review workflow. The reviewer searches, inspects the timeline, and jumps directly to the moments most likely to require a redaction decision.


Section 2: Detection and Face-Lock Redaction

Detection connects TwelveLabs video understanding to the local computer vision pipeline. Rather than running face detection on every frame, Pegasus 1.5 first generates structured metadata about where people appear in the video. The backend uses those timestamps to guide keyframe extraction, making the detection pass faster and more targeted.

Detected faces are clustered into stable per-person identities. The reviewer selects from those identities, not from individual bounding boxes. The selection drives the face-lock lane used for export redaction.


2.1 - Combining TwelveLabs Context with Local Face Detection

The pipeline begins with a Pegasus structured analysis task. The schema defines two segment types: face_redaction_target (for people who may require redaction) and scene_segment (for broader scene context). Setting temperature: 0.1 keeps the output consistent and suitable for automated pipelines.

backend/services/twelvelabs_services (Line 102)

PIPELINE_METADATA_RESPONSE_FORMAT = {
    "type": "segment_definitions",
    "segment_definitions": [
        {
            "id": "face_redaction_target",
            "description": (
                "Return people segments for face redaction decisions. Create one segment per "
                "distinct face/person for each continuous time range where their face is visible "
                "enough to matter for redaction."
            ),
            "fields": [
                {"name": "name", "type": "string"},
                {"name": "description", "type": "string"},
                {"name": "should_anonymize", "type": "boolean"},
                {"name": "is_official", "type": "boolean"},
                {"name": "review_required", "type": "boolean"},
                {"name": "redaction_reason", "type": "string"},
                {"name": "confidence", "type": "number"},
            ],
        },
        {
            "id": "scene_segment",
            "fields": [
                {"name": "description", "type": "string"},
                {"name": "confidence", "type": "number"},
            ],
        },
    ],
}
body = {
    "video": {
        "type": "asset_id",
        "asset_id": asset_id,
    },
    "model_name": "pegasus1.5",
    "analysis_mode": "time_based_metadata",
    "response_format": PIPELINE_METADATA_RESPONSE_FORMAT,
    "temperature": 0.1,
}

Once Pegasus returns the people metadata, the application extracts the time ranges and samples keyframes from those windows. Each keyframe passes through detect_faces(..., with_encodings=True), which uses InsightFace to locate faces, calculate bounding boxes, assess sharpness, and generate identity embeddings.

for kf in keyframes:
    faces = detect_faces(kf["frame"], with_encodings=True)
    for f in faces:
        f["frame_idx"] = kf["frame_idx"]
        f["timestamp"] = kf["timestamp"]
        all_faces.append(f)

Each detection stores frame index and timestamp alongside the embedding, creating time-aware redaction metadata that the clustering step can group into consistent person identities.


2.2 - Selecting the Redaction Target

After detection and clustering, the reviewer selects a person identity from the list of detected individuals. That selection resolves into a list of face targets and their stored embeddings, which the export pipeline uses for face-lock tracking.

if person_ids:
    enriched = get_enriched_faces(job_id) or {}
    unique_faces = job.get("unique_faces") or enriched.get("unique_faces", [])
    for index, face in enumerate(unique_faces):
        stable_person_id = ensure_face_identity(face, fallback_index=index)
        if stable_person_id not in person_ids:
            continue
        face_targets.append(face)
        encoding = face.get("encoding")
        if encoding:
            face_encodings.append(encoding)
        matched_ids.append(stable_person_id)

Reverse redaction uses the same identity resolution in the opposite direction: the reviewer selects the person to preserve, and all other detected faces are blurred in the export.


2.3 - Building Face-Lock Lanes and Rendering the Redacted Export

For each selected person identity, the pipeline calls build_face_lock_lane(job_id, person_id). The lane builder draws on three sources: stored InsightFace appearance records, TwelveLabs entity search time ranges, and saved semantic person ranges from Pegasus. The result is a lane document covering the full span of that identity across the video.

face_lock_tracks = {}
if face_targets and not reverse_face_redaction:
    from services.face_lock_track import build_face_lock_lane

    for face in face_targets:
        person_id = get_face_identity(face)
        if not person_id:
            continue

        lane_doc = build_face_lock_lane(job_id, person_id)
        if lane_doc:
            face_lock_tracks[person_id] = lane_doc
appearances = collect_person_appearances(selected_face)

video_id = str(job.get("twelvelabs_video_id") or "").strip()
entity_ranges = get_entity_search_ranges(selected_face, video_id)
saved_person_ranges = get_face_semantic_time_ranges(selected_face)
semantic_ranges = entity_ranges + saved_person_ranges

segments = build_face_lock_segments(
    appearances, semantic_ranges, fps, total_frames, duration_sec,
)

During export, the lane is converted into a frame-indexed bounding box lookup table (face_lock_bboxes_by_frame) and guided with YOLOv8-Face refinement. The renderer checks each frame against this table and applies apply_detection_redaction wherever a face-lock bounding box is registered.

if face_lock_bboxes_by_frame and not preview_only:
    for entry in face_lock_bboxes_by_frame.get(frame_idx, ()):
        lane_bbox = entry.get("bbox")
        if lane_bbox:
            apply_detection_redaction(frame, lane_bbox, "face")

The output is a stable blur that tracks the selected identity through motion, partial occlusion, and camera movement without requiring per-frame manual correction.


Section 3: Privacy Metadata with Pegasus 1.5

Pegasus 1.5 supports schema-driven time-based metadata, which means the application can define exactly what type of privacy risk to detect and what structured fields to return for each segment. This is the mechanism that powers Meta Insights in the review interface.


3.1 - Structuring the Privacy Risk Schema

The schema defines a single segment type, privacy_risk_segment, which keeps the Pegasus output focused on actionable review targets: faces, documents, screens, license plates, sensitive text, and protected individuals. Fields like risk_level, scene_role, redaction_decision, and reason give reviewers documented rationale for each flagged moment, not just a timestamp. 

backend/services/pegasus_privacy (Line 75)

PEGASUS_RESPONSE_FORMAT = {
    "type": "segment_definitions",
    "segment_definitions": [
        {
            "id": "privacy_risk_segment",
            "description": (
                f"{PEGASUS_PRIVACY_PROMPT} Do not create broad background or crowd segments. Each "
                "segment must be narrow, actionable, and tied to one visible target that should be "
                "redacted or reviewed with care."
            ),
            "fields": [
                {
                    "name": "privacy_category",
                    "type": "string",
                    "description": (
                        "One of person, face, screen, document, text, license_plate, logo, object, scene. "
                        "Use scene only when the whole frame contains sensitive material; do not use it "
                        "for ordinary courtroom background."
                    ),
                    "enum": ["person", "face", "screen", "document", "text", "license_plate", "logo", "object", "scene"],
                },
                {
                    "name": "risk_level",
                    "type": "string",
                    "description": "One of low, medium, high.",
                    "enum": ["low", "medium", "high"],
                },
                {
                    "name": "label",
                    "type": "string",
                    "description": "Short target name, for example Main verdict subject, Protected witness, Visible ID, Phone screen, or License plate.",
                },
                {
                    "name": "description",
                    "type": "string",
                    "description": "What is visible and why this exact target needs redaction or careful review.",
                },
                {
                    "name": "reason",
                    "type": "string",
                    "description": (
                        "Specific reason this item should be redacted. For courtroom people, state why this is "
                        "the main verdict subject or another protected/private person; do not include generic "
                        "courtroom observers."
                    ),
                },
                {
                    "name": "scene_role",
                    "type": "string",
                    "description": (
                        "Role of the target in context. Use verdict_subject, defendant, respondent, or accused for "
                        "the main person whose verdict is being discussed. Ordinary judges, lawyers, clerks, officers, "
                        "jury, audience, reporters, and bystanders should not be segmented."
                    ),
                    "enum": [
                        "verdict_subject",
                        "defendant",
                        "respondent",
                        "accused",
                        "protected_witness",
                        "victim",
                        "minor",
                        "private_non_party",
                        "sensitive_item",
                        "unknown",
                    ],
				# More Segments Defined ...
                }
            ],
        }
    ],
}


3.2 - Generating Timestamped Privacy Segments

The schema is passed to Pegasus with analysis_mode: "time_based_metadata". This instructs Pegasus to return the response as a structured timeline rather than a single document summary. Setting temperature: 0.1 keeps the output deterministic, which is important for compliance workflows where consistency across repeated runs matters.

backend/services/twelvelabs_services (Line 1044)

def create_pegasus_privacy_task(asset_id, *, response_format):
    """Create a Pegasus 1.5 async structured-analysis task from an existing asset id."""
    body = {
        "video": {
            "type": "asset_id",
            "asset_id": asset_id,
        },
        "model_name": "pegasus1.5",
        "analysis_mode": "time_based_metadata",
        "response_format": response_format,
        "temperature": 0.1,
    }

The backend saves an initial job artifact with empty timeline_events and recommended_actions fields, then polls until the task completes. Each Pegasus segment is converted into two objects: a timeline event (with start_sec, end_sec, severity, category, reason, and redaction_decision) and a recommended action that tells the reviewer what to do next.

event = {
    "id": event_id,
    "start_sec": round(start_sec, 3),
    "end_sec": round(end_sec, 3),
    "severity": severity,
    "category": category,
    "label": label[:120],
    "description": description[:600],
    "reason": reason[:600],
    "redaction_target": redaction_target[:120] or None,
    "scene_role": scene_role[:120] or None,
    "redaction_decision": redaction_decision[:120] or None,
    "subject_selection": subject_selection[:120] or None,
    "confidence": round(confidence, 3),
    "review_required": True,
    "recommended_action_ids": [action_id],
}


3.3 - Rendering Privacy Metadata as a Review Interface

The timeline lane renders each Pegasus event as a clickable hotspot positioned at its start_sec and end_sec. Clicking a hotspot opens the Meta Insights panel, focuses the event, and seeks the video to that timestamp.

The sidebar surfaces the full event metadata: severity, category, reason, redaction decision, redaction target, subject selection, scene role, confidence, and handling note. Reviewers see not just where something requires attention, but why, which is the documented rationale a GDPR audit requires.


Section 4: Open-Ended Video Analysis for Review

In addition to structured metadata, the application supports free-form contextual questions against the indexed video. A reviewer can ask what sensitive information appears throughout the footage, which moments carry the highest compliance risk, or what is happening around a specific timestamp. This is useful for situations where the reviewer does not yet know what they are looking for.

The frontend sends the video_id and the reviewer's prompt to /api/analyze-custom. The backend prepends a formatting instruction that requests timestamps whenever specific moments are referenced, then passes the combined prompt to the TwelveLabs Analyze service.

backend/services/twelvelabs_services (Line 666)

def analyze_video_custom(video_id, prompt):
    client = get_client()
    logger.info("Custom analysis on video %s", video_id)
    enhanced_prompt = f"{ANALYZE_FORMAT_INSTRUCTION}\n\n{prompt}"
    result = client.analyze(
        video_id=video_id,
        prompt=enhanced_prompt,
        temperature=0.2,
        request_options={"timeout_in_seconds": TWELVELABS_ANALYZE_TIMEOUT_SEC},
    )
    return {"data": result.data, "id": result.id}

The response returns a plain-language analysis grounded in the video content, with timestamps linking the reviewer directly to the relevant moments. This closes the gap between "I need to find something sensitive" and "here is exactly where to look."


What This Approach Makes Possible

GDPR compliance for video has historically required one of two tradeoffs: either a slow, expensive manual review process, or a blunt automated approach that redacts more than the law requires and cannot explain its decisions.

This application takes a different path. Marengo 3.0 retrieves the right moments across an entire corpus using text, images, or registered identity entities. Pegasus 1.5 generates structured, timestamped privacy metadata with documented rationale for each flagged segment. Local face detection clusters identities from AI-guided keyframes. Face-lock lanes maintain stable blur tracks across the full export.

The result is a redaction workflow that is targeted rather than broad, documented rather than opaque, and scalable rather than manual. Every decision point has a reviewable record, which is what defensible compliance actually requires.

Resources

Introduction

Privacy review at scale is not an editing problem. It is an intelligence problem.

When a legal request lands and your team needs to locate, assess, and redact every appearance of a specific person across hours of footage, frame-by-frame review does not work. The GDPR Enforcement Tracker recorded more than €6.06 billion in cumulative fines across 2,793 cases by March 2026. Article 83 still permits penalties up to €20 million or 4% of global annual turnover for the most serious violations. The operational pressure is real, and manual workflows are not built to absorb it.

GDPR-compliant redaction requires three things working together: accurate identification of the right person or object, retrieval of every relevant appearance across a corpus, and redaction limited to what the legal purpose actually requires. Blurring everything is not a defensible strategy. The regulation's data minimization principle pushes toward precision, not blanket suppression.

This tutorial walks through a production-ready GDPR video redaction application built on TwelveLabs. The application uses Marengo 3.0 for multimodal search and entity-based retrieval, Pegasus 1.5 for structured privacy metadata, and a local face detection pipeline for frame-level identity clustering. The result is a workflow that takes a reviewer from video upload to export-ready redacted output in a single interface.

You can explore the live deployment at tl-gdpr-compliance.vercel.app and find the source code at github.com/Hrishikesh332/tl-GDPR-compliance-redaction.


What the Application Does

Most redaction tools are built around manual selection: a reviewer watches footage, draws bounding boxes, and exports a blurred clip. That approach does not scale when the corpus is large, the person appears across multiple clips, or the legal obligation requires documented reasoning for each redaction decision.

This application approaches the problem differently. TwelveLabs serves as the video understanding layer throughout the workflow. Reviewers search across indexed footage using text, images, or registered identity entities. Pegasus 1.5 generates timestamped privacy metadata that surfaces high-risk moments before a reviewer watches a single frame. Local face detection clusters detections into stable per-person identities that persist through the export pipeline. Blur tracks follow a specific person across motion, profile turns, and camera cuts.

The workflow covers the full review cycle: upload and index footage, search for subjects using natural language or face images, review AI-generated privacy risk segments on an interactive timeline, select redaction targets by person identity, and export a stable, face-locked redacted video.


Inside the Redaction Pipeline

The application connects four distinct capabilities into a single review workflow.

Multimodal search with Marengo 3.0 lets reviewers locate any person, object, or scene using text queries, image uploads, or registered face entities. Marengo returns relevant clip segments with confidence scores, which the interface renders as a visual timeline lane with review markers at the strongest matches.

Privacy metadata with Pegasus 1.5 generates timestamped segment data focused on privacy risk. Rather than describing the full video, Pegasus is given a schema that targets specific categories: faces, documents, license plates, screens, sensitive objects, and protected individuals. Each segment includes a risk level, redaction reason, recommended action, and scene role.

Local detection and identity clustering uses Pegasus timestamps to guide keyframe extraction. The application then runs local face detection and InsightFace embeddings, clustering detected faces into stable per-person identities with consistent IDs across the full video.

Face-lock blur and export redaction converts the selected person's detection history into a frame-indexed bounding box lane. During export, the renderer applies blur from that lane on each matching frame, producing a stable, identity-aware redacted video.


Setting Up the Environment

Before building, complete the following prerequisites.

  1. Create a TwelveLabs account and generate an API key. Create an index with Marengo 3.0 and Pegasus 1.5 enabled, and record the Index ID.

  2. Create a TwelveLabs Entity Collection to support face registration and entity search.

  3. Install Python 3 for the Flask backend and Node.js/npm for the React frontend. FFmpeg is recommended for video re-encoding on export.

  4. Clone the repository and follow the setup instructions in the README.

Create backend/.env using the variables defined in .env.example.


Section 1: Entity Management with TwelveLabs

Locating a specific known person across indexed footage requires more than a text search. Registering a face as a TwelveLabs entity creates a reusable identity reference that can be combined with context queries: find this person near a document, in a specific scene type, or performing a specific action.

This matters for redaction precision. Instead of flagging every face in the video, the reviewer can anchor the search to a known identity and retrieve only the moments where that person is visible.


1.1 - Registering Face Assets in the Entity Index

When the user uploads a face image and provides a name, the backend detects the face using a ResNet-10 Face Detector, generates a preview crop, and prepares a face asset for registration with TwelveLabs.

The flow has two steps: upload the face image as an asset, then create a named entity referencing the returned asset ID. This links the visual reference to a searchable identity.

asset_id = twelvelabs_service.upload_face_asset(tmp.name)
metadata = {"name": name}
if preview_base64:
    metadata["face_snap_base64"] = preview_base64

entity_result = twelvelabs_service.create_entity(
    name=name,
    asset_ids=[asset_id],
    description=description or f"Face entity: {name}",
    metadata=metadata,

After registration, the entity is available for identity-constrained search across any indexed video in the collection.


1.2 - Searching with Entity and Text

Search supports three modes: entity-based, text-based, and image-based. Reviewers can combine them depending on what they know at the start of the review.

For entity-based search, the backend wraps the entity ID in TwelveLabs' mention format (<@entity_id>). When the user adds a text qualifier, both are combined so the search is constrained by both identity and context.

backend/services/twelvelabs_services (Line 1214)

def entity_search(entity_id, query_suffix="", index_id=None):
    client = get_client()
    idx = resolve_index_id(index_id)

    query_text = f"<@{entity_id}>"
    if query_suffix:
        query_text = f"<@{entity_id}> {query_suffix}"

    logger.info("Entity search: %s", query_text)
    response = client.search.query(
        index_id=idx,
        search_options=["visual"],
        query_text=query_text,
        group_by="video",
        sort_option="score",
        page_limit=50,
    )
    return serialize_search_results(response)

For image-based search, the backend passes the image as query_media_type="image" with either query_media_file or query_media_url.

if image_url:
    response = client.search.query(
        **kwargs,
        query_media_type="image",
        query_media_url=image_url,
    )

if image_path:
    with open(image_path, "rb") as image_file:
        response = client.search.query(
            **kwargs,
            query_media_type="image",
            query_media_file=image_file,
        )


1.3 - Search Timeline Lane and Review Markers

Search results are rendered not just as a ranked list but as a visual timeline lane. Each result segment has a start and end time, which the frontend maps onto the video scrubber. High-confidence matches are marked with red review indicators so reviewers can distinguish the strongest hits from surrounding context.

This turns search into a review workflow. The reviewer searches, inspects the timeline, and jumps directly to the moments most likely to require a redaction decision.


Section 2: Detection and Face-Lock Redaction

Detection connects TwelveLabs video understanding to the local computer vision pipeline. Rather than running face detection on every frame, Pegasus 1.5 first generates structured metadata about where people appear in the video. The backend uses those timestamps to guide keyframe extraction, making the detection pass faster and more targeted.

Detected faces are clustered into stable per-person identities. The reviewer selects from those identities, not from individual bounding boxes. The selection drives the face-lock lane used for export redaction.


2.1 - Combining TwelveLabs Context with Local Face Detection

The pipeline begins with a Pegasus structured analysis task. The schema defines two segment types: face_redaction_target (for people who may require redaction) and scene_segment (for broader scene context). Setting temperature: 0.1 keeps the output consistent and suitable for automated pipelines.

backend/services/twelvelabs_services (Line 102)

PIPELINE_METADATA_RESPONSE_FORMAT = {
    "type": "segment_definitions",
    "segment_definitions": [
        {
            "id": "face_redaction_target",
            "description": (
                "Return people segments for face redaction decisions. Create one segment per "
                "distinct face/person for each continuous time range where their face is visible "
                "enough to matter for redaction."
            ),
            "fields": [
                {"name": "name", "type": "string"},
                {"name": "description", "type": "string"},
                {"name": "should_anonymize", "type": "boolean"},
                {"name": "is_official", "type": "boolean"},
                {"name": "review_required", "type": "boolean"},
                {"name": "redaction_reason", "type": "string"},
                {"name": "confidence", "type": "number"},
            ],
        },
        {
            "id": "scene_segment",
            "fields": [
                {"name": "description", "type": "string"},
                {"name": "confidence", "type": "number"},
            ],
        },
    ],
}
body = {
    "video": {
        "type": "asset_id",
        "asset_id": asset_id,
    },
    "model_name": "pegasus1.5",
    "analysis_mode": "time_based_metadata",
    "response_format": PIPELINE_METADATA_RESPONSE_FORMAT,
    "temperature": 0.1,
}

Once Pegasus returns the people metadata, the application extracts the time ranges and samples keyframes from those windows. Each keyframe passes through detect_faces(..., with_encodings=True), which uses InsightFace to locate faces, calculate bounding boxes, assess sharpness, and generate identity embeddings.

for kf in keyframes:
    faces = detect_faces(kf["frame"], with_encodings=True)
    for f in faces:
        f["frame_idx"] = kf["frame_idx"]
        f["timestamp"] = kf["timestamp"]
        all_faces.append(f)

Each detection stores frame index and timestamp alongside the embedding, creating time-aware redaction metadata that the clustering step can group into consistent person identities.


2.2 - Selecting the Redaction Target

After detection and clustering, the reviewer selects a person identity from the list of detected individuals. That selection resolves into a list of face targets and their stored embeddings, which the export pipeline uses for face-lock tracking.

if person_ids:
    enriched = get_enriched_faces(job_id) or {}
    unique_faces = job.get("unique_faces") or enriched.get("unique_faces", [])
    for index, face in enumerate(unique_faces):
        stable_person_id = ensure_face_identity(face, fallback_index=index)
        if stable_person_id not in person_ids:
            continue
        face_targets.append(face)
        encoding = face.get("encoding")
        if encoding:
            face_encodings.append(encoding)
        matched_ids.append(stable_person_id)

Reverse redaction uses the same identity resolution in the opposite direction: the reviewer selects the person to preserve, and all other detected faces are blurred in the export.


2.3 - Building Face-Lock Lanes and Rendering the Redacted Export

For each selected person identity, the pipeline calls build_face_lock_lane(job_id, person_id). The lane builder draws on three sources: stored InsightFace appearance records, TwelveLabs entity search time ranges, and saved semantic person ranges from Pegasus. The result is a lane document covering the full span of that identity across the video.

face_lock_tracks = {}
if face_targets and not reverse_face_redaction:
    from services.face_lock_track import build_face_lock_lane

    for face in face_targets:
        person_id = get_face_identity(face)
        if not person_id:
            continue

        lane_doc = build_face_lock_lane(job_id, person_id)
        if lane_doc:
            face_lock_tracks[person_id] = lane_doc
appearances = collect_person_appearances(selected_face)

video_id = str(job.get("twelvelabs_video_id") or "").strip()
entity_ranges = get_entity_search_ranges(selected_face, video_id)
saved_person_ranges = get_face_semantic_time_ranges(selected_face)
semantic_ranges = entity_ranges + saved_person_ranges

segments = build_face_lock_segments(
    appearances, semantic_ranges, fps, total_frames, duration_sec,
)

During export, the lane is converted into a frame-indexed bounding box lookup table (face_lock_bboxes_by_frame) and guided with YOLOv8-Face refinement. The renderer checks each frame against this table and applies apply_detection_redaction wherever a face-lock bounding box is registered.

if face_lock_bboxes_by_frame and not preview_only:
    for entry in face_lock_bboxes_by_frame.get(frame_idx, ()):
        lane_bbox = entry.get("bbox")
        if lane_bbox:
            apply_detection_redaction(frame, lane_bbox, "face")

The output is a stable blur that tracks the selected identity through motion, partial occlusion, and camera movement without requiring per-frame manual correction.


Section 3: Privacy Metadata with Pegasus 1.5

Pegasus 1.5 supports schema-driven time-based metadata, which means the application can define exactly what type of privacy risk to detect and what structured fields to return for each segment. This is the mechanism that powers Meta Insights in the review interface.


3.1 - Structuring the Privacy Risk Schema

The schema defines a single segment type, privacy_risk_segment, which keeps the Pegasus output focused on actionable review targets: faces, documents, screens, license plates, sensitive text, and protected individuals. Fields like risk_level, scene_role, redaction_decision, and reason give reviewers documented rationale for each flagged moment, not just a timestamp. 

backend/services/pegasus_privacy (Line 75)

PEGASUS_RESPONSE_FORMAT = {
    "type": "segment_definitions",
    "segment_definitions": [
        {
            "id": "privacy_risk_segment",
            "description": (
                f"{PEGASUS_PRIVACY_PROMPT} Do not create broad background or crowd segments. Each "
                "segment must be narrow, actionable, and tied to one visible target that should be "
                "redacted or reviewed with care."
            ),
            "fields": [
                {
                    "name": "privacy_category",
                    "type": "string",
                    "description": (
                        "One of person, face, screen, document, text, license_plate, logo, object, scene. "
                        "Use scene only when the whole frame contains sensitive material; do not use it "
                        "for ordinary courtroom background."
                    ),
                    "enum": ["person", "face", "screen", "document", "text", "license_plate", "logo", "object", "scene"],
                },
                {
                    "name": "risk_level",
                    "type": "string",
                    "description": "One of low, medium, high.",
                    "enum": ["low", "medium", "high"],
                },
                {
                    "name": "label",
                    "type": "string",
                    "description": "Short target name, for example Main verdict subject, Protected witness, Visible ID, Phone screen, or License plate.",
                },
                {
                    "name": "description",
                    "type": "string",
                    "description": "What is visible and why this exact target needs redaction or careful review.",
                },
                {
                    "name": "reason",
                    "type": "string",
                    "description": (
                        "Specific reason this item should be redacted. For courtroom people, state why this is "
                        "the main verdict subject or another protected/private person; do not include generic "
                        "courtroom observers."
                    ),
                },
                {
                    "name": "scene_role",
                    "type": "string",
                    "description": (
                        "Role of the target in context. Use verdict_subject, defendant, respondent, or accused for "
                        "the main person whose verdict is being discussed. Ordinary judges, lawyers, clerks, officers, "
                        "jury, audience, reporters, and bystanders should not be segmented."
                    ),
                    "enum": [
                        "verdict_subject",
                        "defendant",
                        "respondent",
                        "accused",
                        "protected_witness",
                        "victim",
                        "minor",
                        "private_non_party",
                        "sensitive_item",
                        "unknown",
                    ],
				# More Segments Defined ...
                }
            ],
        }
    ],
}


3.2 - Generating Timestamped Privacy Segments

The schema is passed to Pegasus with analysis_mode: "time_based_metadata". This instructs Pegasus to return the response as a structured timeline rather than a single document summary. Setting temperature: 0.1 keeps the output deterministic, which is important for compliance workflows where consistency across repeated runs matters.

backend/services/twelvelabs_services (Line 1044)

def create_pegasus_privacy_task(asset_id, *, response_format):
    """Create a Pegasus 1.5 async structured-analysis task from an existing asset id."""
    body = {
        "video": {
            "type": "asset_id",
            "asset_id": asset_id,
        },
        "model_name": "pegasus1.5",
        "analysis_mode": "time_based_metadata",
        "response_format": response_format,
        "temperature": 0.1,
    }

The backend saves an initial job artifact with empty timeline_events and recommended_actions fields, then polls until the task completes. Each Pegasus segment is converted into two objects: a timeline event (with start_sec, end_sec, severity, category, reason, and redaction_decision) and a recommended action that tells the reviewer what to do next.

event = {
    "id": event_id,
    "start_sec": round(start_sec, 3),
    "end_sec": round(end_sec, 3),
    "severity": severity,
    "category": category,
    "label": label[:120],
    "description": description[:600],
    "reason": reason[:600],
    "redaction_target": redaction_target[:120] or None,
    "scene_role": scene_role[:120] or None,
    "redaction_decision": redaction_decision[:120] or None,
    "subject_selection": subject_selection[:120] or None,
    "confidence": round(confidence, 3),
    "review_required": True,
    "recommended_action_ids": [action_id],
}


3.3 - Rendering Privacy Metadata as a Review Interface

The timeline lane renders each Pegasus event as a clickable hotspot positioned at its start_sec and end_sec. Clicking a hotspot opens the Meta Insights panel, focuses the event, and seeks the video to that timestamp.

The sidebar surfaces the full event metadata: severity, category, reason, redaction decision, redaction target, subject selection, scene role, confidence, and handling note. Reviewers see not just where something requires attention, but why, which is the documented rationale a GDPR audit requires.


Section 4: Open-Ended Video Analysis for Review

In addition to structured metadata, the application supports free-form contextual questions against the indexed video. A reviewer can ask what sensitive information appears throughout the footage, which moments carry the highest compliance risk, or what is happening around a specific timestamp. This is useful for situations where the reviewer does not yet know what they are looking for.

The frontend sends the video_id and the reviewer's prompt to /api/analyze-custom. The backend prepends a formatting instruction that requests timestamps whenever specific moments are referenced, then passes the combined prompt to the TwelveLabs Analyze service.

backend/services/twelvelabs_services (Line 666)

def analyze_video_custom(video_id, prompt):
    client = get_client()
    logger.info("Custom analysis on video %s", video_id)
    enhanced_prompt = f"{ANALYZE_FORMAT_INSTRUCTION}\n\n{prompt}"
    result = client.analyze(
        video_id=video_id,
        prompt=enhanced_prompt,
        temperature=0.2,
        request_options={"timeout_in_seconds": TWELVELABS_ANALYZE_TIMEOUT_SEC},
    )
    return {"data": result.data, "id": result.id}

The response returns a plain-language analysis grounded in the video content, with timestamps linking the reviewer directly to the relevant moments. This closes the gap between "I need to find something sensitive" and "here is exactly where to look."


What This Approach Makes Possible

GDPR compliance for video has historically required one of two tradeoffs: either a slow, expensive manual review process, or a blunt automated approach that redacts more than the law requires and cannot explain its decisions.

This application takes a different path. Marengo 3.0 retrieves the right moments across an entire corpus using text, images, or registered identity entities. Pegasus 1.5 generates structured, timestamped privacy metadata with documented rationale for each flagged segment. Local face detection clusters identities from AI-guided keyframes. Face-lock lanes maintain stable blur tracks across the full export.

The result is a redaction workflow that is targeted rather than broad, documented rather than opaque, and scalable rather than manual. Every decision point has a reviewable record, which is what defensible compliance actually requires.

Resources

Introduction

Privacy review at scale is not an editing problem. It is an intelligence problem.

When a legal request lands and your team needs to locate, assess, and redact every appearance of a specific person across hours of footage, frame-by-frame review does not work. The GDPR Enforcement Tracker recorded more than €6.06 billion in cumulative fines across 2,793 cases by March 2026. Article 83 still permits penalties up to €20 million or 4% of global annual turnover for the most serious violations. The operational pressure is real, and manual workflows are not built to absorb it.

GDPR-compliant redaction requires three things working together: accurate identification of the right person or object, retrieval of every relevant appearance across a corpus, and redaction limited to what the legal purpose actually requires. Blurring everything is not a defensible strategy. The regulation's data minimization principle pushes toward precision, not blanket suppression.

This tutorial walks through a production-ready GDPR video redaction application built on TwelveLabs. The application uses Marengo 3.0 for multimodal search and entity-based retrieval, Pegasus 1.5 for structured privacy metadata, and a local face detection pipeline for frame-level identity clustering. The result is a workflow that takes a reviewer from video upload to export-ready redacted output in a single interface.

You can explore the live deployment at tl-gdpr-compliance.vercel.app and find the source code at github.com/Hrishikesh332/tl-GDPR-compliance-redaction.


What the Application Does

Most redaction tools are built around manual selection: a reviewer watches footage, draws bounding boxes, and exports a blurred clip. That approach does not scale when the corpus is large, the person appears across multiple clips, or the legal obligation requires documented reasoning for each redaction decision.

This application approaches the problem differently. TwelveLabs serves as the video understanding layer throughout the workflow. Reviewers search across indexed footage using text, images, or registered identity entities. Pegasus 1.5 generates timestamped privacy metadata that surfaces high-risk moments before a reviewer watches a single frame. Local face detection clusters detections into stable per-person identities that persist through the export pipeline. Blur tracks follow a specific person across motion, profile turns, and camera cuts.

The workflow covers the full review cycle: upload and index footage, search for subjects using natural language or face images, review AI-generated privacy risk segments on an interactive timeline, select redaction targets by person identity, and export a stable, face-locked redacted video.


Inside the Redaction Pipeline

The application connects four distinct capabilities into a single review workflow.

Multimodal search with Marengo 3.0 lets reviewers locate any person, object, or scene using text queries, image uploads, or registered face entities. Marengo returns relevant clip segments with confidence scores, which the interface renders as a visual timeline lane with review markers at the strongest matches.

Privacy metadata with Pegasus 1.5 generates timestamped segment data focused on privacy risk. Rather than describing the full video, Pegasus is given a schema that targets specific categories: faces, documents, license plates, screens, sensitive objects, and protected individuals. Each segment includes a risk level, redaction reason, recommended action, and scene role.

Local detection and identity clustering uses Pegasus timestamps to guide keyframe extraction. The application then runs local face detection and InsightFace embeddings, clustering detected faces into stable per-person identities with consistent IDs across the full video.

Face-lock blur and export redaction converts the selected person's detection history into a frame-indexed bounding box lane. During export, the renderer applies blur from that lane on each matching frame, producing a stable, identity-aware redacted video.


Setting Up the Environment

Before building, complete the following prerequisites.

  1. Create a TwelveLabs account and generate an API key. Create an index with Marengo 3.0 and Pegasus 1.5 enabled, and record the Index ID.

  2. Create a TwelveLabs Entity Collection to support face registration and entity search.

  3. Install Python 3 for the Flask backend and Node.js/npm for the React frontend. FFmpeg is recommended for video re-encoding on export.

  4. Clone the repository and follow the setup instructions in the README.

Create backend/.env using the variables defined in .env.example.


Section 1: Entity Management with TwelveLabs

Locating a specific known person across indexed footage requires more than a text search. Registering a face as a TwelveLabs entity creates a reusable identity reference that can be combined with context queries: find this person near a document, in a specific scene type, or performing a specific action.

This matters for redaction precision. Instead of flagging every face in the video, the reviewer can anchor the search to a known identity and retrieve only the moments where that person is visible.


1.1 - Registering Face Assets in the Entity Index

When the user uploads a face image and provides a name, the backend detects the face using a ResNet-10 Face Detector, generates a preview crop, and prepares a face asset for registration with TwelveLabs.

The flow has two steps: upload the face image as an asset, then create a named entity referencing the returned asset ID. This links the visual reference to a searchable identity.

asset_id = twelvelabs_service.upload_face_asset(tmp.name)
metadata = {"name": name}
if preview_base64:
    metadata["face_snap_base64"] = preview_base64

entity_result = twelvelabs_service.create_entity(
    name=name,
    asset_ids=[asset_id],
    description=description or f"Face entity: {name}",
    metadata=metadata,

After registration, the entity is available for identity-constrained search across any indexed video in the collection.


1.2 - Searching with Entity and Text

Search supports three modes: entity-based, text-based, and image-based. Reviewers can combine them depending on what they know at the start of the review.

For entity-based search, the backend wraps the entity ID in TwelveLabs' mention format (<@entity_id>). When the user adds a text qualifier, both are combined so the search is constrained by both identity and context.

backend/services/twelvelabs_services (Line 1214)

def entity_search(entity_id, query_suffix="", index_id=None):
    client = get_client()
    idx = resolve_index_id(index_id)

    query_text = f"<@{entity_id}>"
    if query_suffix:
        query_text = f"<@{entity_id}> {query_suffix}"

    logger.info("Entity search: %s", query_text)
    response = client.search.query(
        index_id=idx,
        search_options=["visual"],
        query_text=query_text,
        group_by="video",
        sort_option="score",
        page_limit=50,
    )
    return serialize_search_results(response)

For image-based search, the backend passes the image as query_media_type="image" with either query_media_file or query_media_url.

if image_url:
    response = client.search.query(
        **kwargs,
        query_media_type="image",
        query_media_url=image_url,
    )

if image_path:
    with open(image_path, "rb") as image_file:
        response = client.search.query(
            **kwargs,
            query_media_type="image",
            query_media_file=image_file,
        )


1.3 - Search Timeline Lane and Review Markers

Search results are rendered not just as a ranked list but as a visual timeline lane. Each result segment has a start and end time, which the frontend maps onto the video scrubber. High-confidence matches are marked with red review indicators so reviewers can distinguish the strongest hits from surrounding context.

This turns search into a review workflow. The reviewer searches, inspects the timeline, and jumps directly to the moments most likely to require a redaction decision.


Section 2: Detection and Face-Lock Redaction

Detection connects TwelveLabs video understanding to the local computer vision pipeline. Rather than running face detection on every frame, Pegasus 1.5 first generates structured metadata about where people appear in the video. The backend uses those timestamps to guide keyframe extraction, making the detection pass faster and more targeted.

Detected faces are clustered into stable per-person identities. The reviewer selects from those identities, not from individual bounding boxes. The selection drives the face-lock lane used for export redaction.


2.1 - Combining TwelveLabs Context with Local Face Detection

The pipeline begins with a Pegasus structured analysis task. The schema defines two segment types: face_redaction_target (for people who may require redaction) and scene_segment (for broader scene context). Setting temperature: 0.1 keeps the output consistent and suitable for automated pipelines.

backend/services/twelvelabs_services (Line 102)

PIPELINE_METADATA_RESPONSE_FORMAT = {
    "type": "segment_definitions",
    "segment_definitions": [
        {
            "id": "face_redaction_target",
            "description": (
                "Return people segments for face redaction decisions. Create one segment per "
                "distinct face/person for each continuous time range where their face is visible "
                "enough to matter for redaction."
            ),
            "fields": [
                {"name": "name", "type": "string"},
                {"name": "description", "type": "string"},
                {"name": "should_anonymize", "type": "boolean"},
                {"name": "is_official", "type": "boolean"},
                {"name": "review_required", "type": "boolean"},
                {"name": "redaction_reason", "type": "string"},
                {"name": "confidence", "type": "number"},
            ],
        },
        {
            "id": "scene_segment",
            "fields": [
                {"name": "description", "type": "string"},
                {"name": "confidence", "type": "number"},
            ],
        },
    ],
}
body = {
    "video": {
        "type": "asset_id",
        "asset_id": asset_id,
    },
    "model_name": "pegasus1.5",
    "analysis_mode": "time_based_metadata",
    "response_format": PIPELINE_METADATA_RESPONSE_FORMAT,
    "temperature": 0.1,
}

Once Pegasus returns the people metadata, the application extracts the time ranges and samples keyframes from those windows. Each keyframe passes through detect_faces(..., with_encodings=True), which uses InsightFace to locate faces, calculate bounding boxes, assess sharpness, and generate identity embeddings.

for kf in keyframes:
    faces = detect_faces(kf["frame"], with_encodings=True)
    for f in faces:
        f["frame_idx"] = kf["frame_idx"]
        f["timestamp"] = kf["timestamp"]
        all_faces.append(f)

Each detection stores frame index and timestamp alongside the embedding, creating time-aware redaction metadata that the clustering step can group into consistent person identities.


2.2 - Selecting the Redaction Target

After detection and clustering, the reviewer selects a person identity from the list of detected individuals. That selection resolves into a list of face targets and their stored embeddings, which the export pipeline uses for face-lock tracking.

if person_ids:
    enriched = get_enriched_faces(job_id) or {}
    unique_faces = job.get("unique_faces") or enriched.get("unique_faces", [])
    for index, face in enumerate(unique_faces):
        stable_person_id = ensure_face_identity(face, fallback_index=index)
        if stable_person_id not in person_ids:
            continue
        face_targets.append(face)
        encoding = face.get("encoding")
        if encoding:
            face_encodings.append(encoding)
        matched_ids.append(stable_person_id)

Reverse redaction uses the same identity resolution in the opposite direction: the reviewer selects the person to preserve, and all other detected faces are blurred in the export.


2.3 - Building Face-Lock Lanes and Rendering the Redacted Export

For each selected person identity, the pipeline calls build_face_lock_lane(job_id, person_id). The lane builder draws on three sources: stored InsightFace appearance records, TwelveLabs entity search time ranges, and saved semantic person ranges from Pegasus. The result is a lane document covering the full span of that identity across the video.

face_lock_tracks = {}
if face_targets and not reverse_face_redaction:
    from services.face_lock_track import build_face_lock_lane

    for face in face_targets:
        person_id = get_face_identity(face)
        if not person_id:
            continue

        lane_doc = build_face_lock_lane(job_id, person_id)
        if lane_doc:
            face_lock_tracks[person_id] = lane_doc
appearances = collect_person_appearances(selected_face)

video_id = str(job.get("twelvelabs_video_id") or "").strip()
entity_ranges = get_entity_search_ranges(selected_face, video_id)
saved_person_ranges = get_face_semantic_time_ranges(selected_face)
semantic_ranges = entity_ranges + saved_person_ranges

segments = build_face_lock_segments(
    appearances, semantic_ranges, fps, total_frames, duration_sec,
)

During export, the lane is converted into a frame-indexed bounding box lookup table (face_lock_bboxes_by_frame) and guided with YOLOv8-Face refinement. The renderer checks each frame against this table and applies apply_detection_redaction wherever a face-lock bounding box is registered.

if face_lock_bboxes_by_frame and not preview_only:
    for entry in face_lock_bboxes_by_frame.get(frame_idx, ()):
        lane_bbox = entry.get("bbox")
        if lane_bbox:
            apply_detection_redaction(frame, lane_bbox, "face")

The output is a stable blur that tracks the selected identity through motion, partial occlusion, and camera movement without requiring per-frame manual correction.


Section 3: Privacy Metadata with Pegasus 1.5

Pegasus 1.5 supports schema-driven time-based metadata, which means the application can define exactly what type of privacy risk to detect and what structured fields to return for each segment. This is the mechanism that powers Meta Insights in the review interface.


3.1 - Structuring the Privacy Risk Schema

The schema defines a single segment type, privacy_risk_segment, which keeps the Pegasus output focused on actionable review targets: faces, documents, screens, license plates, sensitive text, and protected individuals. Fields like risk_level, scene_role, redaction_decision, and reason give reviewers documented rationale for each flagged moment, not just a timestamp. 

backend/services/pegasus_privacy (Line 75)

PEGASUS_RESPONSE_FORMAT = {
    "type": "segment_definitions",
    "segment_definitions": [
        {
            "id": "privacy_risk_segment",
            "description": (
                f"{PEGASUS_PRIVACY_PROMPT} Do not create broad background or crowd segments. Each "
                "segment must be narrow, actionable, and tied to one visible target that should be "
                "redacted or reviewed with care."
            ),
            "fields": [
                {
                    "name": "privacy_category",
                    "type": "string",
                    "description": (
                        "One of person, face, screen, document, text, license_plate, logo, object, scene. "
                        "Use scene only when the whole frame contains sensitive material; do not use it "
                        "for ordinary courtroom background."
                    ),
                    "enum": ["person", "face", "screen", "document", "text", "license_plate", "logo", "object", "scene"],
                },
                {
                    "name": "risk_level",
                    "type": "string",
                    "description": "One of low, medium, high.",
                    "enum": ["low", "medium", "high"],
                },
                {
                    "name": "label",
                    "type": "string",
                    "description": "Short target name, for example Main verdict subject, Protected witness, Visible ID, Phone screen, or License plate.",
                },
                {
                    "name": "description",
                    "type": "string",
                    "description": "What is visible and why this exact target needs redaction or careful review.",
                },
                {
                    "name": "reason",
                    "type": "string",
                    "description": (
                        "Specific reason this item should be redacted. For courtroom people, state why this is "
                        "the main verdict subject or another protected/private person; do not include generic "
                        "courtroom observers."
                    ),
                },
                {
                    "name": "scene_role",
                    "type": "string",
                    "description": (
                        "Role of the target in context. Use verdict_subject, defendant, respondent, or accused for "
                        "the main person whose verdict is being discussed. Ordinary judges, lawyers, clerks, officers, "
                        "jury, audience, reporters, and bystanders should not be segmented."
                    ),
                    "enum": [
                        "verdict_subject",
                        "defendant",
                        "respondent",
                        "accused",
                        "protected_witness",
                        "victim",
                        "minor",
                        "private_non_party",
                        "sensitive_item",
                        "unknown",
                    ],
				# More Segments Defined ...
                }
            ],
        }
    ],
}


3.2 - Generating Timestamped Privacy Segments

The schema is passed to Pegasus with analysis_mode: "time_based_metadata". This instructs Pegasus to return the response as a structured timeline rather than a single document summary. Setting temperature: 0.1 keeps the output deterministic, which is important for compliance workflows where consistency across repeated runs matters.

backend/services/twelvelabs_services (Line 1044)

def create_pegasus_privacy_task(asset_id, *, response_format):
    """Create a Pegasus 1.5 async structured-analysis task from an existing asset id."""
    body = {
        "video": {
            "type": "asset_id",
            "asset_id": asset_id,
        },
        "model_name": "pegasus1.5",
        "analysis_mode": "time_based_metadata",
        "response_format": response_format,
        "temperature": 0.1,
    }

The backend saves an initial job artifact with empty timeline_events and recommended_actions fields, then polls until the task completes. Each Pegasus segment is converted into two objects: a timeline event (with start_sec, end_sec, severity, category, reason, and redaction_decision) and a recommended action that tells the reviewer what to do next.

event = {
    "id": event_id,
    "start_sec": round(start_sec, 3),
    "end_sec": round(end_sec, 3),
    "severity": severity,
    "category": category,
    "label": label[:120],
    "description": description[:600],
    "reason": reason[:600],
    "redaction_target": redaction_target[:120] or None,
    "scene_role": scene_role[:120] or None,
    "redaction_decision": redaction_decision[:120] or None,
    "subject_selection": subject_selection[:120] or None,
    "confidence": round(confidence, 3),
    "review_required": True,
    "recommended_action_ids": [action_id],
}


3.3 - Rendering Privacy Metadata as a Review Interface

The timeline lane renders each Pegasus event as a clickable hotspot positioned at its start_sec and end_sec. Clicking a hotspot opens the Meta Insights panel, focuses the event, and seeks the video to that timestamp.

The sidebar surfaces the full event metadata: severity, category, reason, redaction decision, redaction target, subject selection, scene role, confidence, and handling note. Reviewers see not just where something requires attention, but why, which is the documented rationale a GDPR audit requires.


Section 4: Open-Ended Video Analysis for Review

In addition to structured metadata, the application supports free-form contextual questions against the indexed video. A reviewer can ask what sensitive information appears throughout the footage, which moments carry the highest compliance risk, or what is happening around a specific timestamp. This is useful for situations where the reviewer does not yet know what they are looking for.

The frontend sends the video_id and the reviewer's prompt to /api/analyze-custom. The backend prepends a formatting instruction that requests timestamps whenever specific moments are referenced, then passes the combined prompt to the TwelveLabs Analyze service.

backend/services/twelvelabs_services (Line 666)

def analyze_video_custom(video_id, prompt):
    client = get_client()
    logger.info("Custom analysis on video %s", video_id)
    enhanced_prompt = f"{ANALYZE_FORMAT_INSTRUCTION}\n\n{prompt}"
    result = client.analyze(
        video_id=video_id,
        prompt=enhanced_prompt,
        temperature=0.2,
        request_options={"timeout_in_seconds": TWELVELABS_ANALYZE_TIMEOUT_SEC},
    )
    return {"data": result.data, "id": result.id}

The response returns a plain-language analysis grounded in the video content, with timestamps linking the reviewer directly to the relevant moments. This closes the gap between "I need to find something sensitive" and "here is exactly where to look."


What This Approach Makes Possible

GDPR compliance for video has historically required one of two tradeoffs: either a slow, expensive manual review process, or a blunt automated approach that redacts more than the law requires and cannot explain its decisions.

This application takes a different path. Marengo 3.0 retrieves the right moments across an entire corpus using text, images, or registered identity entities. Pegasus 1.5 generates structured, timestamped privacy metadata with documented rationale for each flagged segment. Local face detection clusters identities from AI-guided keyframes. Face-lock lanes maintain stable blur tracks across the full export.

The result is a redaction workflow that is targeted rather than broad, documented rather than opaque, and scalable rather than manual. Every decision point has a reviewable record, which is what defensible compliance actually requires.

Resources