Skip to main content

Remote File

1. Overview

1.1. Function Description

Guandata BI supports integration with FTP/SFTP files and ADLS Gen2 files.

  • FTP/SFTP: Integrating data from FTP/SFTP datasets means that Guandata BI supports actively obtaining data from FTP/SFTP file storage services into BI. For example, users can centrally store data downloaded from third-party platforms or e-commerce channels on an FTP server and integrate it into BI for unified access, ETL processing, and further analysis.

  • ADLS Gen2: Azure Data Lake Storage Gen2 is Microsoft's next-generation big data storage product on Azure, designed for enterprise-level data lake applications. It inherits the ease of use and low cost of Azure Blob Storage, while adding enterprise features such as directory hierarchy and fine-grained permission control. Guandata BI has developed a dedicated connector for ADLS Gen2 to better support Azure ecosystem and customer data integration needs.

1.2. Prerequisites

When integrating remote files, you need to select a data account.

Users who have already created a data account can skip this step. Users who have not created a data account can refer to Data Account for more information.

2. User Guide

The integration process for FTP/SFTP and ADLS Gen2 files is basically the same except for Select Data Table. Please follow the configuration instructions below to complete the integration.

Operation Steps

  1. Enter the "Data Preparation" page and click the "Dataset" module in the left navigation;

  2. Click the "New Dataset" button, select File > Remote File;

  3. Select the FTP/SFTP or ADLS Gen2 connector and configure the relevant items as needed;

  4. After confirming the data table information is correct, click Confirm New to complete the integration.

2.1. Select Connector

Entry: Data Preparation > Dataset > New Dataset > Remote File > FTP/SFTP or ADLS Gen2 connector.

2.gif

2.2. Select Data Table

FTP/SFTP

1. Select Account

  • Select the corresponding data acquisition account;

  • Create a new account;

image.png

2. Select Data Table Type

  • Supports Excel and CSV files (single selection);

  • Supports selection of zip files. When selected, only the corresponding file types after decompression will be used during upload;

3. Select File

  • Only supports browsing files in the corresponding directory under the data account;

  • After selection, the file will automatically enter the selection box on the right;

  • After deselection, the file will be removed from the selection box on the right;

  • Clicking "Clear" in the upper right corner of the selection box will clear all files.

    image.png

Note:
1. If multiple files are selected, the processing logic uses the rules for Excel/CSV file datasets
2. When multiple files are selected, the first selected file will be used as the header, and the other files will only retain data for fields that exist in the first file;
3. Only the first file will be previewed during data preview;
4. Both Excel and CSV files must not exceed 500MB.

4. Data Range Configuration (Optional)

  • Set header: Users can set the dataset header position. The system only extracts data from the header to the last non-empty row. The default is the first row;

  • Select column data: Click "Custom" to specify the column range. Users can specify the column range, and the system will only extract data within the specified range. The default is from column A to the last non-empty column in the header.

  • When uploading multiple sheets with the same header and column data range, you can use the "Sync Configuration" function. The system will synchronize the current configuration scheme to all other datasets.

5.png

Note: The system automatically creates datasets by sheet.

5. New Field (Optional)

When users download files from business systems, the file name will automatically include the current date timestamp. Now, when integrating SFTP/FTP datasets, users can create a "New Field" to synchronize the timestamp into the dataset, thus obtaining key information about the specific date to which the detail data belongs.

  • Supports naming a new field, currently only String type is supported, filled with the file name;

  • In the case of incremental updates to the dataset, the incremental file name will be updated as the incremental value.

6.png

ADLS Gen2

1. Select Account

Select an existing data account from the dropdown. If there is no data account, click Create Account, fill in the URL, Client Id, Client Secret, Tenant Id, and other required parameters, and click "Test Connection".

7.png

2. Enter File Path

Copy the URL address from the ADLS Gen2 file's overview property directory and fill it in here. Only Excel and CSV files are supported.

3. Select Data Table Type:

Only Excel and CSV files are supported.

4. Preview

After completing the above operations, click "Preview". After confirming the data preview is correct, click Next.

8.png

2.3. Set Update Method

FTP/SFTP

1. Data Connection Method: Only extraction is supported.

9.png

2. Deduplication Primary Key (Optional): Optional content is the field list of the first file.

3. Incremental Update: Supports incremental or full update. If not checked, it is a full update. When performing incremental updates, you can select files for update using file name + time macro, e.g., test-file-{{{yesterday}}}.csv

  • Provides quick time macro input;

  • The system will update the data according to the data update cycle;

  • Default update rule: All data matching the file + time macro will be updated to the dataset. Other comparison operations are not supported.

10.png

Update TypeDescription
Full UpdateIf incremental update is not checked: each update will replace all data in the created dataset; the data source for full update is all files uploaded initially; if the corresponding file in the FTP service is missing, an error will be prompted in the dataset details page, indicating the file does not exist.
Incremental UpdateYou must complete the file name input before proceeding: only single selection is allowed in the file box on the left: you can select a folder or a specific file. After selecting a folder/file, the folder path will be displayed: data update will be obtained from which directory; (the default path is the directory path configured in the data account) File name: which data to update; Add parameter: supports global parameters; see Global Parameters. When incrementally updating CSV files, you need to select file encoding and delimiter as in the first step of creation, defaulting to UTF-8 and comma ",".

4. Data Update Cycle: Same as database: manual, daily, weekly, monthly.

5. Schedule Status: When enabled, the dataset will be automatically updated according to the data update cycle configuration. When disabled, the system will stop automatic updates.

11.png

Note: Manual and URL triggers are not affected by the schedule status.

6. Task Priority: Set different priorities for the current scheduled task according to business needs and urgency. The current task priorities include "Highest", "High", "Medium", "Low", or "Lowest". "Highest" means it will be executed first among all tasks.

ADLS Gen2

Except for the "Incremental Update" configuration, the update method configuration for ADLS Gen2 is basically the same as FTP/SFTP. For details, see above.

When configuring the update method for ADLS Gen2, after clicking "Incremental Update", users need to fill in the file path. In the overview property directory of the ADLS Gen2 file, intercept the file path part from the URL and fill it in the corresponding input box. Only CSV and Excel files are supported.

12.png

2.4. Confirm Data Table Information

Enter the dataset name and save path, and you can also modify the field types. Click Confirm New to complete the creation of the current dataset.

13.png

3. Notes

For FTP/SFTP files:

  • When uploading for the first time, you can upload up to 10 files at a time, and up to 200 fields. If there are many historical files to upload, you need to upload them by appending new data in "Update Data";

  • During incremental updates:

    • Directly write matched fields;

    • Discard new fields;

    • Missing fields will be empty.