Skip to main content

Standard Database Connection Guide

Overview

Guandata provides data integration services for a wide range of databases and supports more than 40 standard databases, including but not limited to MySQL, PostgreSQL, Greenplum, SQL Server, Oracle, Presto, Amazon Redshift, MaxCompute, SAP HANA, Teradata, BW, TiDB, Doris, Vertica, Netsuite, ClickHouse, Hive, IBM DB2, HAWQ, AnalyticDB, Gbase 8t, Informix, Kylin, Impala, Sybase, MangoDB, Druid, Trino, DAMENG, Snowflake, StarRocks, CirroData, and Access.

This article explains how to connect a standard database to BI.

Note

Overall Process

Steps

Entry point: Data Preparation > Dataset > Add Dataset > Database

|300

Select the Data Account and Data Table

Choose Create Individually, then select or create a database account. For details, see Data Account.

During data connection setup, Guandata BI supports two operating modes for one or more database tables: SQL Query and Visual Modeling.

  • SQL Query: a traditional query mode intended for IT or technical users, allowing complex SQL queries against one or more database tables.
  • Visual Modeling: a UI-based mode intended for business users. It allows users to work with relationships between tables through a graphical interface, reducing the complexity and technical barrier of data integration.
Note
  1. At present, only MySQL and Impala support Visual Modeling.
  2. MongoDB and SAP BW differ in data structure and query behavior, so their table-selection process differs from the standard flow. See MongoDB and SAP BW Connection Guide.

SQL Query

Select one or more tables from the database table directory on the left, then enter SQL in the editor on the right to specify how the data should be retrieved.

During table configuration, dynamic values such as time macros, global parameters, and user attributes are also supported. At query time, the current values of those parameters are injected into the SQL statement, allowing flexible dimension switching and dynamic queries without manually editing SQL each time.

Note

User Attributes are supported only for Direct Connection datasets. If they are used in Extract datasets, an error occurs.

When a Direct Connection dataset created with user attributes is used in a Card query, the current viewer's user attributes are used in the calculation.

For example, time macros can be used to generate dynamic SQL based on the current date, which helps define time ranges for scheduled refresh and incremental updates:

select date, cost, revenue from table_name where date >='{{{yesterday}}}'
Note

You can click the AI Coding icon on the right to have AI generate SQL automatically. For usage details, see AI Assistant.

For more details, see Dynamic Time Macros and Global Parameters.

Visual Modeling

Visual Modeling is a feature provided by Guandata to reduce the technical barrier of data integration and allow non-technical users to create relationships between tables more easily. After selecting MySQL or Impala when creating a dataset, you can choose Visual Modeling during the database query step.

Note

Currently, only MySQL and Impala support Visual Modeling.

Create Relationships

For already created datasets, Visual Modeling is also supported when updating the model structure.

  1. Drag one or more tables from the database table directory on the left into the relationship canvas.

    • If you select only one table, you can go directly to field configuration and preview the result after selecting one or more fields.
    • If you select multiple tables, drag each subsequent table to the end of the previous one and combine them through table joins or row union into a wide table.

  2. When multiple tables are added, the system supports inner join, left outer join, full join, and row union, and lets users define the join fields as needed.

    Note
    1. Supported operations follow the capabilities of the database itself. For example, if MySQL does not support full join, Visual Modeling does not support it either.
    2. If two tables contain fields with the same name, the system preferentially uses those fields as join keys automatically.
    3. Users can also add, remove, and edit join conditions manually.

    If a table is no longer needed, right-click the current node and choose Delete.

    Note

    This action also removes all child nodes. Proceed carefully.

  3. All relationships must be configured before you can proceed to Field Configuration.

Field Configuration
  1. After relationship creation is complete, proceed to field configuration to select columns and define filter conditions.

  2. After previewing the dataset and confirming there are no issues, continue to the next step.

Data Connection and Update Configuration

Common Configuration

Database Connection Mode
Connection ModeDescriptionApplicable Scenario
Direct ConnectionConnects directly to the user database for query and computation, and BI visualizes the query results returned from the database.Suitable when enterprise security policies do not allow data to land in other systems, when business data changes frequently and users need near real-time data, or when the data volume is large and the customer database itself has strong query performance.
ExtractData is extracted into Guandata BI for computation. Extract supports both full refresh and incremental update.Suitable when the customer does not have a dedicated data platform or warehouse and wants to build a lightweight data warehouse on top of Guandata BI.
Scheduling Status

Scheduling Status controls whether the dataset updates automatically according to the configured update strategy. When scheduling is enabled, the system triggers dataset updates automatically based on the configuration. When it is disabled, automatic updates stop, but users can still update the dataset through other methods such as manual update or URL-triggered update.

Direct Connection Configuration

Cache Validity Period

In Direct Connection mode, setting a cache validity period helps optimize query performance, reduce database pressure, and still ensure that users retrieve relatively fresh data.

Note
  1. To reduce database pressure, identical query SQL uses cache preferentially, and the cache validity period can be configured.
  2. When the cache expires, the system queries the database again and refreshes the cache.

For example, if the cache validity period is set to 10:00 every day, the first query cache generated after 10:00 yesterday remains valid until 10:00 today. Identical queries to the same Card during that period reuse the same cache. After 10:00 today, the next access to the Card triggers a new database query and generates a new cache valid until 10:00 tomorrow.

Real-Time Card Data and Cache Validity Duration

To support real-time analysis, Guandata BI allows Cards and dashboards built from datasets to refresh in near real time through the Real-Time Card Data / Cache Validity Duration setting. By default, the refresh cycle can be configured between 1 minute and 30 minutes, or set to no-cache mode.

For example, if it is set to 1 minute, the related Card automatically fetches and displays the latest data every minute.

Note
  1. Dependency: To achieve real-time refresh on dashboards or large-screen pages, this option must be enabled and the Auto Refresh switch must also be enabled on the related dashboard or large-screen page. Both conditions are required.
  2. Priority: After Cache Validity Duration is configured under Real-Time Card Data, it takes priority over the general Cache Validity Period above.

Extract Configuration

Dataset Update Cycle

In Extract mode, the dataset update cycle is directly tied to the trigger time of the scheduled extract task. Users can define tasks to run at specific times, such as nightly or every Monday. Extraction can be full or incremental depending on data changes and business needs.

Note

A single dataset cannot be updated more than four times per day.

Deduplication Primary Key

The Deduplication Primary Key uses one or more key fields to prevent or remove duplicate data and ensure row uniqueness. This helps avoid duplicate records that could lead to incorrect analysis results.

Note
  1. A similar concept also appears in the Smart ETL Data Deduplication operator. See Data Deduplication.
  2. Choose deduplication fields carefully to avoid removing important data incorrectly.
  3. For large datasets, deduplication may consume considerable compute resources and time.
Full Refresh and Incremental Update

Guandata BI supports both Full Refresh and Incremental Update during extraction and updates.

If Incremental Update is not selected, the system performs Full Refresh by default.

TypeDescriptionApplicable Scenario
Full RefreshFully re-imports all source data into Guandata BI and replaces the dataset content.Suitable when the table structure changes, data needs to be reset, or large-scale migration is required.
Incremental UpdateUpdates or inserts only incremental data into the dataset according to the configured primary key.Suitable for frequently updated or large tables such as transaction logs or user behavior logs.
24h Scheduled Update Density Map

The 24-hour scheduled update density map shows the number and density of scheduled dataset update tasks during a time window. Users can choose relatively idle time slots to avoid task queueing and reduce system pressure. This data is provided for scheduling reference only.

Task Priority

Task priority can be configured to ensure that more important dataset extract tasks are executed first when multiple tasks are scheduled at the same time. The available priorities are Highest, High, Medium, Low, and Lowest, with Medium as the default.

|350

Confirm the Data Table Information

This step covers dataset storage, flexible handling of field comments, sensitive dataset marking, dataset preview, and field attribute adjustment so that data remains consistent and accurate during later analysis, processing, and visualization.

Note
  1. Field type switching is supported only in Extract mode.
  2. If a column contains inconsistent types during extraction, the task does not fail.

Batch Create Databases

Supports bulk creation of tables from the same data account database. This feature allows users to extract all tables from a single database in bulk into the BI platform.
  1. When selecting bulk creation for database connection creation, you can check multiple database tables and edit SQL statements for different tables below the SQL query area. Click Next after finishing all SQL edits.

  2. Configure the connection mode and update frequency for each dataset in the data connection and update settings.

  3. Finally, verify the table information of each dataset one by one and click Confirm to create datasets.

Note

Bulk creation is not supported for MongoDB and SAP BW databases.