Unified Schema for Connector Content¶

Search AI utilizes a Unified Schema to standardize data ingestion from diverse content sources, including enterprise applications, files, and webpages. This schema defines a consistent structure that allows data from different formats and systems to be interpreted and utilized uniformly for search operations.

When content is ingested via connectors, data from various fields across different applications is automatically mapped to the most relevant fields in the unified schema. This ensures that Search AI maintains a consistent representation of content, regardless of the source.

The Unified Schema has a predefined set of fields, also referred to as Document Fields, to store the content and metadata of the ingested content. During ingestion, data from the source application is automatically assigned to the most relevant unified schema field. Users can override default mappings using the Field Mapping option in the connector configuration. The schema can also be extended to accommodate new custom fields.

The following are the default fields of the Unified Schema.

Note

Note that some of the fields in the list are system fields and can't be updated.

Document Fields	Description	Is System Field
access_level	Defines the visibility or permission level associated with the document.	No
archived_at	Timestamp indicating when the document or record was archived.	No
assignee	Identifier of the user or entity responsible for the document, task, or record.	No
assignee_email	Email address of the user assigned to the document	No
assignee_name	Display name of the assignee	No
blockedAcl	A list of users or groups explicitly restricted from accessing the document.	No
branch	Represents the branch, version, or division of content, particularly in systems that support branching (For example, code repositories, knowledge bases)	No
category	Classification label used to group similar documents or content types	No
channel_id	Unique identifier for the communication channel from where the document originates.	No
checksum	A unique hash value generated for the document content.	No
chunkType	Type of chunk.	Yes
closedOn	Timestamp indicating when the item (For example, issue, task, or conversation) was closed.	No
comment_count	Total number of comments associated with the item.	No
comments	List or collection of user comments related to the item	No
commit_id	Unique identifier of the commit associated with the item.	No
company_id	Unique identifier for the company or organization.	No
company_name	Name of the company associated with the record.	No
contact_id	Unique identifier for the contact person.	No
contact_name	Name of the contact person.	No
content	Main textual or structured content of the record (for example, body of a document, note, or comment).	No
contentId	Unique identifier of the content entity.	No
conversation_id	Unique identifier of the conversation or thread.	No
createdBy	User ID or name of the person who created the item.	No
createdOn	Timestamp when the item was created.	No
deleted_at	Timestamp when the item was deleted (if soft-deleted).	No
doc_created_by	Identifier or name of the user who created the document.	No
doc_created_by_email	Email address of the document creator.	No
doc_created_by_id	Unique ID of the document creator.	No
doc_created_by_name	Full name of the document creator.	No
doc_created_on	Timestamp when the document was created.	No
doc_id	Timestamp when the document was created.	No
doc_path	File path or storage path of the document.	No
doc_source_type	Type of source from which the document was ingested.	No
doc_updated_by	Identifier or name of the user who last updated the document.	No
doc_updated_by_email	Email address of the user who updated the document.	No
doc_updated_by_id	Unique ID of the user who last updated the document.	No
doc_updated_on	Timestamp when the document was last updated.	No
downvote_count	Number of down votes received by the item (for example, post, comment, or answer).	No
due_date	The due date or deadline associated with the task or item.	No
extractionMethod	Method used to extract data from the source.	Yes
extractionStrategy	Strategy or approach followed for data extraction	Yes
file_content	Actual text or encoded content of the file.	No
file_image_url	URL to the preview image of the file.	No
file_preview	Short summary or visual preview of the file content.	No
file_title	Title or display name of the file.	No
file_url	Direct URL link to access or download the file.	No
html	Raw HTML version of the document or page content.	No
issueType	Type or category of issue.	No
keywords	List of keywords or tags extracted or assigned to the content.	No
labels	Labels or classifications applied to the item	No
language	Language in which the content is written	No
lastSyncAt	Timestamp of the most recent synchronization with the source system.	No
location	Physical or virtual location associated with the record	No
mentioned_users	List of users mentioned or tagged within the content.	No
message_type	Type of message	No
mime_type	MIME type of the file or document	No
object_created_by_email	Email address of the user who created the object.	No
object_created_by_id	ID of the user who created the object.	No
object_created_by_name	Name of the user who created the object.	No
object_created_on	Timestamp when the object was created.	No
object_type	Type of object	No
organization_id	Unique identifier for the organization.	No
organization_name	Name of the organization associated with the record.	No
owner_email	Email address of the item owner or assignee.	No
owner_id	Unique ID of the item owner or assignee	No
owner_name	Full name of the item owner or assignee.	No
page_body	Text content or body of an HTML page	No
page_count	Number of pages in the document from which the content is ingested.	No
page_html	Page content in HTML format.	No
page_image_url	URL for the page image or thumbnail	No
page_preview	Short preview of the page content.	No
page_title	Title of the page.	No
page_number	Page number of the content	No
page_url	URL of the page or web resource.	No
parent_url	URL of the parent document or source from which this page is derived.	No
parent_name	Name of the parent entity.	No
priority	Priority level of the item.	No
project_description	Description or summary of the project.	No
project_id	Unique identifier for the project.	No
project_name	Name of the project.	No
project_owner_email	Email address of the project owner	No
project_owner_id	ID of the project owner.	No
project_owner_name	Name of the project owner.	No
project_status	Current status of the project.	No
projectName	Name of the project.	No
published_at	Timestamp when the item or content was published.	No
reporter	Identifier or name of the person who reported the issue.	No
reporter_email	Email address of the reporter.	No
reporter_name	Full name of the reporter.	No
repository_id	Unique ID of the code or content repository	No
repository_name	Name of the repository.	No
resource_type	Type of resource	No
share_count	Number of times the item has been shared	No
size	File size or data volume	No
sprint	Sprint or iteration to which the item belongs	No
status	Current status of the item	No
sys_file_type	System-defined file type classification	Yes
sys_racl	Role-based Access Control List defining permissions for the resource.	No
sourceType	Type of content source: web crawl, file upload, or connector.	No
sys_source_name	Name of the system or connector from which the item originated.	Yes
tags	Tags associated with the record for categorization or search.	No
thread_id	Unique identifier of the thread or discussion chain.	No
title	Title or name of the item.	No
updatedBy	Identifier or name of the user who last updated the record.	No
updatedOn	Timestamp when the record was last updated.	No
url	Link to access the resource or item.	No
upvote_count	Number of up votes received by the item.	No
view_count	Number of times the item has been viewed.	No
visibility	Access level of the item .	No
workspace_id	Unique identifier for the workspace or environment.	No
workspace_name	Name of the workspace associated with the item.	No

Custom Fields in Schema¶

Search AI allows the extension of the Unified Schema by adding up to 50 custom fields, enabling users to include additional data from third-party applications as searchable content. This flexibility ensures that unique business requirements and specialized metadata can be accommodated seamlessly.

Custom fields can also be used in the workbench, where users can map any value to them. For example, they can send the ingested content to an LLM and ask to summarize, and then store the summarized values in the custom field.

Adding a New Field¶

To add a new custom field,

Click on the Manage Schema button on the Manage Content page in the connector.
Click on +New Field button.
Enter the following fields:
- Display Name - The user-friendly name for the field (for display in UI only).
- Data Type - Type of value of the field. This can be a string or array .
- Field Name - This is the technical name of the field. This name is used as a reference in the scripts in the document workbench or in the post-processor script for field mapping in connectors. For array-type fields, use cfa1 to cfa5, and for string-type fields, use cfs1 to 45.
- Description - A brief description of the intended use of the field.

Field Mapping¶

By default, the fields ingested from a connector are automatically mapped to the most appropriate fields in the unified schema. But this can be customized for specific business requirements.

For example, assume an organization uses a Google Drive connector to ingest documents into Search AI. By default, the Google Drive field createdTime is mapped to the unified schema field createdOn. However, if the org wants to display the last modified user information in search results. To achieve this, the field mapping can be updated to include the Google Drive field lastModifyingUser.displayName, mapping it to the unified schema field updatedBy.

Implementing Field Mapping

After an initial sync with a connector, you can view the payload of the response and use it to map the fields as required with the post-processor script.

Go to the Field Mapping tab under Manage Content.
The source payload shows the actual response from the connector. The mandatory fields required by Search AI are listed on the right pane.
Use the source payload and post-processor scripts to map fields from the source applications to the fields of the unified schema. A default script is presented for each connector, which shows how the fields are mapped for the connector by default.

For instance, if the source payload is as follows and you need to map the createdAt field to the doc_created_on field in the unified schema, add the following line to the script.

Source Payload

{
    "incidents": {
        "title": "I : System Outages duplicates ----",
        "content": "System Outages , Impact Start Date : 2025-04-04T12:14:32.419Z, Impact End Date : Mon May 12 2025 10:12:25 GMT+0000 (Coordinated Universal Time), Responders : User : John Doe , Actions : ",
        "type": "incident",
        "id": "79d68c5a-762f-4c0a-b412-49a6d75b92b0",
        "tinyId": "5",
        "status": "open",
        "labels": [
            "System Outages"
        ],
        "createdAt": "2025-04-04T12:14:32.419Z",
        "updatedAt": "2025-04-04T12:14:49.526Z",
        "priority": "P3",
        "responders": "User: John Doe, ",
        "actions": [],
        "impactStartDate": "2025-04-04T12:14:32.419Z",
        "impactEndDate": "2025-05-12T10:12:25.985Z"
    }
}

Script Updates

context.doc_created_on  = context?.raw_json?.createdAt;

If a connector supports multiple objects, the source payload displays a concatenated set of fields for all those objects. When mapping fields from two or more supported objects to custom fields, create separate custom fields for each object,as shown below. Even though the records for these objects are distinct, the field mapping section is currently set up to configure them together.

For instance, if a connector supports incidents and alerts, and the titles of these are to be assigned to custom fields, use separate custom fields.

context.cfs1 = context?.raw_json?.incidentTitle;
Context.cfs2 = context?.raw_json?.alertTitle;

Send Feedback