Skip to content

Unified Schema for Connector Content

Search AI utilizes a Unified Schema to standardize data ingestion from diverse content sources, including enterprise applications, files, and webpages. This schema defines a consistent structure that allows data from different formats and systems to be interpreted and utilized uniformly for search operations.

When content is ingested via connectors, data from various fields across different applications is automatically mapped to the most relevant fields in the unified schema. This ensures that Search AI maintains a consistent representation of content, regardless of the source.

The Unified Schema has a predefined set of fields, also referred to as Document Fields, to store the content and metadata of the ingested content. During ingestion, data from the source application is automatically assigned to the most relevant unified schema field. Users can override default mappings using the Field Mapping option in the connector configuration. The schema can also be extended to accommodate new custom fields.

The following are the default fields of the Unified Schema.

Note

Note that some of the fields in the list are system fields and can't be updated.

Document Fields Description Is System Field
access_level Defines the visibility or permission level associated with the document. No
archived_at Timestamp indicating when the document or record was archived. No
assignee Identifier of the user or entity responsible for the document, task, or record. No
assignee_email Email address of the user assigned to the document No
assignee_name Display name of the assignee No
blockedAcl A list of users or groups explicitly restricted from accessing the document. No
branch Represents the branch, version, or division of content, particularly in systems that support branching (For example, code repositories, knowledge bases) No
category Classification label used to group similar documents or content types No
channel_id Unique identifier for the communication channel from where the document originates. No
checksum A unique hash value generated for the document content. No
chunkType Type of chunk. Yes
closedOn Timestamp indicating when the item (For example, issue, task, or conversation) was closed. No
comment_count Total number of comments associated with the item. No
comments List or collection of user comments related to the item No
commit_id Unique identifier of the commit associated with the item. No
company_id Unique identifier for the company or organization. No
company_name Name of the company associated with the record. No
contact_id Unique identifier for the contact person. No
contact_name Name of the contact person. No
content Main textual or structured content of the record (for example, body of a document, note, or comment). No
contentId Unique identifier of the content entity. No
conversation_id Unique identifier of the conversation or thread. No
createdBy User ID or name of the person who created the item. No
createdOn Timestamp when the item was created. No
deleted_at Timestamp when the item was deleted (if soft-deleted). No
doc_created_by Identifier or name of the user who created the document. No
doc_created_by_email Email address of the document creator. No
doc_created_by_id Unique ID of the document creator. No
doc_created_by_name Full name of the document creator. No
doc_created_on Timestamp when the document was created. No
doc_id Timestamp when the document was created. No
doc_path File path or storage path of the document. No
doc_source_type Type of source from which the document was ingested. No
doc_updated_by Identifier or name of the user who last updated the document. No
doc_updated_by_email Email address of the user who updated the document. No
doc_updated_by_id Unique ID of the user who last updated the document. No
doc_updated_on Timestamp when the document was last updated. No
downvote_count Number of downvotes received by the item (for example, post, comment, or answer). No
due_date The due date or deadline associated with the task or item. No
extractionMethod Method used to extract data from the source. Yes
extractionStrategy Strategy or approach followed for data extraction Yes
file_content Actual text or encoded content of the file. No
file_image_url URL to the preview image of the file. No
file_preview Short summary or visual preview of the file content. No
file_title Title or display name of the file. No
file_url Direct URL link to access or download the file. No
html Raw HTML version of the document or page content. No
issueType Type or category of issue. No
keywords List of keywords or tags extracted or assigned to the content. No
labels Labels or classifications applied to the item No
language Language in which the content is written No
lastSyncAt Timestamp of the most recent synchronization with the source system. No
location Physical or virtual location associated with the record No
mentioned_users List of users mentioned or tagged within the content. No
message_type Type of message No
mime_type MIME type of the file or document No
object_created_by_email Email address of the user who created the object. No
object_created_by_id ID of the user who created the object. No
object_created_by_name Name of the user who created the object. No
object_created_on Timestamp when the object was created. No
object_type Type of object No
organization_id Unique identifier for the organization. No
organization_name Name of the organization associated with the record. No
owner_email Email address of the item owner or assignee. No
owner_id Unique ID of the item owner or assignee No
owner_name Full name of the item owner or assignee. No
page_body Text content or body of an HTML page No
page_count Number of pages in the document from which the content is ingested. No
page_html Page content in HTML format. No
page_image_url URL for the page image or thumbnail No
page_preview Short preview of the page content. No
page_title Title of the page. No
page_number Page number of the content No
page_url URL of the page or web resource. No
parent_url URL of the parent document or source from which this page is derived. No
parent_name Name of the parent entity. No
priority Priority level of the item. No
project_description Description or summary of the project. No
project_id Unique identifier for the project. No
project_name Name of the project. No
project_owner_email Email address of the project owner No
project_owner_id ID of the project owner. No
project_owner_name Name of the project owner. No
project_status Current status of the project. No
projectName Name of the project. No
published_at Timestamp when the item or content was published. No
reporter Identifier or name of the person who reported the issue. No
reporter_email Email address of the reporter. No
reporter_name Full name of the reporter. No
repository_id Unique ID of the code or content repository No
repository_name Name of the repository. No
resource_type Type of resource No
share_count Number of times the item has been shared No
size File size or data volume No
sprint Sprint or iteration to which the item belongs No
status Current status of the item No
sys_file_type System-defined file type classification Yes
sys_racl Role-based Access Control List defining permissions for the resource. No
sourceType Type of content source: web crawl, file upload, or connector. No
sys_source_name Name of the system or connector from which the item originated. Yes
tags Tags associated with the record for categorization or search. No
thread_id Unique identifier of the thread or discussion chain. No
title Title or name of the item. No
updatedBy Identifier or name of the user who last updated the record. No
updatedOn Timestamp when the record was last updated. No
url Link to access the resource or item. No
upvote_count Number of upvotes received by the item. No
view_count Number of times the item has been viewed. No
visibility Access level of the item . No
workspace_id Unique identifier for the workspace or environment. No
workspace_name Name of the workspace associated with the item. No

Custom Fields in Schema

Search AI allows the extension of the Unified Schema by adding up to 50 custom fields, enabling users to include additional data from third-party applications as searchable content. This flexibility ensures that unique business requirements and specialized metadata can be accommodated seamlessly.

Custom fields can also be used in the workbench, where users can map any value to them. For example, they can send the ingested content to an LLM and ask to summarize, and then store the summarized values in the custom field.

Adding a New Field

To add a new custom field,

  • Click on the Manage Schema button on the Manage Content page in the connector.
  • Click on +New Field button.
  • Enter the following fields:
    • Display Name - The user-friendly name for the field (for display in UI only).
    • Data Type - Type of value of the field. This can be a string or array .
    • Field Name - This is the technical name of the field. This name is used as a reference in the scripts in the document workbench or in the post-processor script for field mapping in connectors. For array-type fields, use cfa1 to cfa5, and for string-type fields, use cfs1 to 45.
    • Description - A brief description of the intended use of the field.

Field Mapping

By default, the fields ingested from a connector are automatically mapped to the most appropriate fields in the unified schema. But this can be customized for specific business requirements.

For example, assume an organization uses a Google Drive connector to ingest documents into Search AI. By default, the Google Drive field createdTime is mapped to the unified schema field createdOn. However, if the org wants to display the last modified user information in search results. To achieve this, the field mapping can be updated to include the Google Drive field lastModifyingUser.displayName, mapping it to the unified schema field updatedBy.

Implementing Field Mapping

After an initial sync with a connector, you can view the payload of the response and use it to map the fields as required with the post-processor script.

  • Go to the Field Mapping tab under Manage Content.
  • The source payload shows the actual response from the connector. The mandatory fields required by Search AI are listed on the right pane.
  • Use the source payload and post-processor scripts to map fields from the source applications to the fields of the unified schema. A default script is presented for each connector, which shows how the fields are mapped for the connector by default. Field Mapping

For instance, if the source payload is as follows and you need to map the createdAt field to the doc_created_on field in the unified schema, add the following line to the script.

Source Payload

{
    "incidents": {
        "title": "I : System Outages duplicates ----",
        "content": "System Outages , Impact Start Date : 2025-04-04T12:14:32.419Z, Impact End Date : Mon May 12 2025 10:12:25 GMT+0000 (Coordinated Universal Time), Responders : User : John Doe , Actions : ",
        "type": "incident",
        "id": "79d68c5a-762f-4c0a-b412-49a6d75b92b0",
        "tinyId": "5",
        "status": "open",
        "labels": [
            "System Outages"
        ],
        "createdAt": "2025-04-04T12:14:32.419Z",
        "updatedAt": "2025-04-04T12:14:49.526Z",
        "priority": "P3",
        "responders": "User: John Doe, ",
        "actions": [],
        "impactStartDate": "2025-04-04T12:14:32.419Z",
        "impactEndDate": "2025-05-12T10:12:25.985Z"
    }
}

Script Updates

context.doc_created_on  = context?.raw_json?.createdAt;

If a connector supports multiple objects, the source payload displays a concatenated set of fields for all those objects. When mapping fields from two or more supported objects to custom fields, create separate custom fields for each object,as shown below. Even though the records for these objects are distinct, the field mapping section is currently set up to configure them together.

For instance, if a connector supports incidents and alerts, and the titles of these are to be assigned to custom fields, use separate custom fields.

context.cfs1 = context?.raw_json?.incidentTitle;
Context.cfs2 = context?.raw_json?.alertTitle;