Skip to main content

File-based Dataset Update Solution Instructions

https://docs.guandata.com/article/1/dataset-api.html#nav-39-H2


1. Path and Files

(1) Upload to the following directory on the server:

/home/guandata/data/guandata-store/upload_dataset (non-minio environment)

/home/guandata/data/minio/guandata-store/upload_dataset (minio environment)

(2) Each dataset is stored as a directory under this directory, with the directory name named after the dataset's id, i.e., dsId.

2. File Format

(1) Save as parquet file format, snappy compression, written as file groups, don't write single large files.

(2) Special characters in field names need to be escaped: ' ', ',', ';', '{', '}', '(', ')', '\n', '\t', '=', '.'

The escape character chosen is '%', so '%' also needs to be escaped. Scala example code is as follows:

 private val forbiddenParquetColumNameChars = Set(' ', ',', ';', '{', '}', '(', ')', '\n', '\t', '=', '.')

def encodeParquetColumnName(name: String) = name.map { x => {
if (forbiddenParquetColumNameChars.contains(x) || x == '%') {
f"%%${x.toInt}%02x"
} else x.toString
}
}.mkString

3. Data Update Interface

After copying the data files to the specified path, call the following interface to notify the BI service to update the dataset.

URL

$home_url/public-api/data-source/{dsId}/refresh-with-file

METHOD

GET

Header

Add x-auth-token. This Token is the user's Token after logging into the system. For the login interface, please refer to section 1.1 of the following document: Using API to Get Login Token.

PARAMETERS

NameLocationTypeMeaningRequiredRemarks
dsIdPathStringDataset IDYes
overwriteQueryBooleanWhether to update completelyYestrue: complete, false: incremental
fileTypeQueryStringFile typeNoCurrently only supports PARQUET

Response

{
"result":"ok", // "ok": interface call successful (doesn't mean dataset update successful), "fail": failed
"response": // Returns generated dataset update task related information when successful, null when failed
{
"taskId":"9ed75970-4439-11eb-a411-93e5c8edcd0e", // Generated dataset update task id
"status":"Submitted",
"result":"Processing"
},
"error": // Returns when failed
{
"status": 500, // Error code
"message": "Submission failed", // Error message
"detail": {} // Detailed error information
}
}