Ingest Data API¶

This API allows you to ingest and index data into the SearchAI application. You can directly ingest structured data as chunk fields using the API, ingest an uploaded document, or use this API to perform incremental web crawling to web sources that already exist in the application.

Ingesting Documents¶

To ingest content from a file, use the Upload File API to upload your file to the application.
After uploading, include the fileId from the Upload File API response in the Ingest API to process the file content.
Currently, only uploading of PDF, docx, ppt or txt is supported. If any other type of file is sent for ingestion, the API throws error.

Ingesting Structured Data¶

To ingest structured data, add the content to the body of the request object in the API. Ensure that the data corresponds to the Chunk Fields listed in the table below.
File Structure: The JSON file must adhere to a specific structure for SearchAI to interpret the data correctly:
- The file name is used as the recordTitle.
- The JSON file should consist of an array of objects, where each object represents a chunk of data.
- The fields in each chunk must correspond to the chunk fields listed in the table below.

Crawling Web Pages¶

This API can be used for incremental web crawling. The API will add the ingested content corresponding to an existing web source in Search AI.
- The sourceName in the API must match the Source Title for the web domain added in Search AI.
- Set the sourceType as “web”.
- Provide the URLs of the pages to be crawled in the URLs array under the documents field.
The web crawl uses the crawl configuration for the source set in Search AI.
If an existing URL is provided, it is crawled again. If a new URL is provided, it is crawled if the crawl configuration permits.

API Specifications¶

Method	POST
Endpoint	`https://{{host}}/api/public/bot/:botId/ingest-data`
Content Type	`application/json`
Authorization	`auth: {{JWT Token}}`
API Scope	Ingest data

Query Parameters¶

PARAMETER	REQUIRED	DESCRIPTION
host	Required	The environment URL. For example, `https://platform.example.org`
Bot ID	Required	Unique Identifier of your application. Bot ID corresponds to the appID for your application. To view your App ID, go to the Dev Tools under App Settings. You can view the AppID under the API scopes.

Request Parameters¶

PARAMETER	REQUIRED	DESCRIPTION
sourceName	Yes	SourceName is mandatory. If the given name does not exist then a new source is created automatically.
sourceType	Yes	This can take the following values: “json” - to upload structured data in the form of chunk fields , sent via the request object. When sourceType is json, even if file ID is present it will not be considered “file” - to upload documents based on file ID. When sourceType is file only file ID is considered. If chunk payload is present it will be ignored. “web” - to crawl web pages using the URLs provided in the payload.
documents	Yes	Depending upon the value of the sourceType, this field can be used for: Passing the chunks fields in JSON format. Passing the reference of the file containing the chunk fields in JSON format. Passing the web URLs to be crawled.

Sample Request - Ingesting Chunks directly¶

For ingesting chunks directly, use the following format.

“sourceName”: “Abc”,  
“sourceType” : “json”,
"documents": [
    {
      "title": "Cybersecurity",
      "chunks": [
        {
          "chunkText": "Cybersecurity is the practice of protecting systems,    networks, and programs from digital attacks. With the rise of cyber threats like ransomware and data breaches, cybersecurity has become a critical concern for businesses and governments worldwide.",
          "recordUrl": "https://www.example.com/cybersecurity",
          "chunkTitle": "The Importance of Cybersecurity"
        }
      ]

Note that the fields inside the chunks object should correspond to the chunk fields. To view the chunk fields, refer to the Chunk Browser.

Sample Request - Incremental Web Crawl¶

For crawling web pages, use the following format.

{
  “sourceName”: “myWebDomain”,  
  “sourceType” : “web”,
  "documents": [

    {

      "urls": ["https://example.com/docs/", "https://example.com/product-guide/","https://example.com/user-guide/"]

    }

  ]
}

where, urls contains the list of urls to be crawled.

Note that the URLs field should point to the list of URLs that need to be crawled. If a URL is already crawled, it is crawled again. If a URL is new, it is crawled if the crawl configuration of the source permits.

Sample Request - Ingesting Content from Files¶

For ingesting content from a file, pass the following information in this field.

{
“sourceName”: “Abc”,  
“sourceType” : “file”, 
“documents”: [

      {

         "fileId": "f12455"

      }

  ]
}

where, fileId is the unique identifier of the uploaded file.

Use the Upload File API to upload the file to the application. This API will return the fileId in response, which should be used in the Ingest API to ingest and index the content of the file.

Send Feedback