File-based Dataset Update Solution Instructions
Please refer to the following link for the latest and complete usage instructions
https://docs.guandata.com/article/1/dataset-api.html#nav-39-H2
1. Path and Files
(1) Upload to the following directory on the server:
/home/guandata/data/guandata-store/upload_dataset (non-minio environment)
/home/guandata/data/minio/guandata-store/upload_dataset (minio environment)
(2) Each dataset is stored as a directory under this directory, with the directory name named after the dataset's id, i.e., dsId.
2. File Format
(1) Save as parquet file format, snappy compression, written as file groups, don't write single large files.
(2) Special characters in field names need to be escaped: ' ', ',', ';', '{', '}', '(', ')', '\n', '\t', '=', '.'
The escape character chosen is '%', so '%' also needs to be escaped. Scala example code is as follows:
private val forbiddenParquetColumNameChars = Set(' ', ',', ';', '{', '}', '(', ')', '\n', '\t', '=', '.')
def encodeParquetColumnName(name: String) = name.map { x => {
if (forbiddenParquetColumNameChars.contains(x) || x == '%') {
f"%%${x.toInt}%02x"
} else x.toString
}
}.mkString
3. Data Update Interface
After copying the data files to the specified path, call the following interface to notify the BI service to update the dataset.
URL
$home_url/public-api/data-source/{dsId}/refresh-with-file
METHOD
GET
Header
Add x-auth-token. This Token is the user's Token after logging into the system. For the login interface, please refer to section 1.1 of the following document: Using API to Get Login Token.
PARAMETERS
Name | Location | Type | Meaning | Required | Remarks |
dsId | Path | String | Dataset ID | Yes | |
overwrite | Query | Boolean | Whether to update completely | Yes | true: complete, false: incremental |
fileType | Query | String | File type | No | Currently only supports PARQUET |
Response
{
"result":"ok", // "ok": interface call successful (doesn't mean dataset update successful), "fail": failed
"response": // Returns generated dataset update task related information when successful, null when failed
{
"taskId":"9ed75970-4439-11eb-a411-93e5c8edcd0e", // Generated dataset update task id
"status":"Submitted",
"result":"Processing"
},
"error": // Returns when failed
{
"status": 500, // Error code
"message": "Submission failed", // Error message
"detail": {} // Detailed error information
}
}