Skip to main content

Multi-source Data Integration Overview

1. Overview

For enterprises' scattered and polymorphic multi-source heterogeneous data, Guandata BI provides comprehensive data integration capabilities. Through unified data caliber, it builds a data pool with complete data overview, breaking down data silos.

Guandata currently supports integrating various types of data such as databases, business application systems, and files through JDBC, API integration, and remote file service integration, providing multi-source data integration solutions.

image.png

2. Getting Started

Concept Description:

Connectors are tools that connect different data sources and data processing platforms, ensuring that datasets can be correctly accessed and used. Connectors support diverse data source types.

Datasets are the basic units for storing and managing data. Through connectors, users can obtain data from various data sources and form "datasets". Datasets can be used by Guandata BI for various computational analysis and processing.

2.1. Core Process

image.png

1. Select Data Account: Guandata BI provides various connectors, including database types, file types, and various application-type data sources. Users can filter based on data source types and select data accounts with permissions.

Note: During system deployment, support is provided to hide unnecessary data connection methods, only displaying required connectors.

2. Select Data Table: After successfully connecting to the data account, users will see available data tables in the data account and select one or more data tables for operations. This can be sheet pages of files or tables in databases, etc. The specific page depends on different data sources (or this step may not appear).

3. Data Connection and Update Settings: Users need to configure detailed settings for data connections, including connection methods (direct connection or extraction), scheduling status, and data update cycles, etc. Different database types and connection method choices will affect configuration options. Users achieve accurate and efficient data updates through these settings.

4. Confirm Data Table Information: Finally, users need to confirm the field information of the selected data tables, view and edit field names, data types, and other attributes to ensure that the data table information is consistent with expectations, ensuring correct parsing and display after data integration.

2.2. Learning Path

Category

Integration Data Source Type

Description

File Type

File Data

Provides services for importing data from files such as Excel, CSV, etc., and performing data processing.

Online Document Data

Supports integration with Feishu spreadsheets, enabling users to seamlessly import and sync data from Feishu spreadsheets, forming online document datasets.

Remote File Data

Supports integrating file data from remote file storage servers, such as: FTP/SFTP, ADSL Gen2.

Database Type

Database

Supports connecting to various databases, including but not limited to MySQL, PostgreSQL, Greenplum, SQL Server, Oracle, and 40+ other databases; and also supports self-service integration with cloud vendors, domestic databases, and other external databases.

Stored Procedure Data

Supports integration with stored procedures from Oracle, MySQL, SQLServer, etc., created through parameterized extraction, and provides functionality for parameterized dynamic data queries on stored procedure datasets on the page side.

Application Type

Web Service

Integrates API data through Web Service, supporting custom flexible configuration of parsing rules for API returned data and selection of required fields.

Account Dataset

Supports synchronizing account data from commonly used OA systems, achieving seamless integration of account data between enterprise OA systems and Guandata data analysis platform through account synchronization. Currently supports WeChat Work, DingTalk, LDAP, Feishu, etc.

Card Dataset

Guandata BI supports creating datasets using cards as data sources. Creating datasets from cards can use card analysis results for further data processing and analysis.

Universe Dataset

Guandata BI supports users to integrate data sources from Universe databases.

View Dataset

Is a dynamically parameterizable executable dataset based on SparkSQL. Users dynamically associate and calculate 1 or more non-direct connection datasets (except real-time datasets) and reorganize them into new datasets.

Form Dataset

Provides multi-terminal data collection form entry (also known as form entry) functionality. Users can directly perform data entry through Guandata BI, including template maintenance and collection summarization work. Collected form-type data can be quickly integrated into Guandata's BI analysis platform for subsequent visualization analysis, forming a feedback collection-ETL-data presentation closed loop.