Dataset Management & Formatting

This article walks the reader through getting data into ClearQuery and configuring it.

Uploading / importing data into ClearQuery is one of the most important functions within ClearQuery as it is vital for using the system. This article will walk the user through how to import and tailor data for the best user experience.

Note: If this is the user’s first-time uploading data into CQ, immediately upon login the user will be directed to the Datasets Management page. If the user does not have a dataset ready to upload, the system offers three (3) sample datasets. Located at the bottom of the page is the sample dataset drop down to help the user get familiar with CQ.

To Add a New Dataset:

1. Select "Datasets Management" from the Feature Navigation Bar.
2. Click "Add New Dataset" in the top right of the screen.
3. Click on the desired dataset connector / import feature.

ClearQuery offers several different "out-of-the-box" connectors as well as the potential for custom connections depending on the CQ agreement (whether its SaaS or a custom deployment). To import data, simply click the desired connector or file type on the screen. For additional data connectors, please reach out to your administrator or account manager.

CQ will walk the user through the required steps (credentials, if required) to upload or connect the data, whether it is selecting a file to upload or linking ClearQuery to an external dataset via a Connector. For step-by-step guidance on connecting to external data sources, please review the Data Connector Guide.

By default, for manually uploaded files, ClearQuery supports the following file types and sizing:
- CSV / TSV up to 2GBs
- JSON / NDJSON up to 2GBs
- PARQUET up to 100 mbs

Note: if utilizing ClearQuery SaaS, the source of the data needs to be accessible via public internet.

4. Click "Next" to proceed to configuring the dataset
5. If necessary, configure the field types associated with the dataset. Field types are described below:

Configure Dataset:

This screen allows the user to accomplish several things:

Name the dataset: this is how it will be displayed when working within this dataset and how it will be stored in ClearQuery.
Assign Field Types to the various columns within the dataset. Field types tell ClearQuery how to interact / display the data within this field. There are several Field Type Classifications:
- Categorical – a variable that can take on one of a limited, and usually fixed, number of possible values.
- Date – variable that denotes a specific point in time.
- Decimal – number with decimal points
- Identifier - machine generated identifiers that the user does not want to run analytics against (ingested but not displayed)
- Number - any whole number.
- Text - alphanumeric data that does not meet the requirements of the Categorical or Numeric field types.
- Percent - a specified amount in or for every hundred
- Geopoint – location coordinates
- Ignore - this selection tells CQ to ignore this column within the dataset. (not ingested)
- Auto - CQ analyzes the provided inputs and chooses the best available option.
- Date – variable that denotes a specific point in time.
- Decimal – number with decimal points
- Identifier - machine generated identifiers that the user does not want to run analytics against
- Number - any whole number.
- Text - alphanumeric data that does not meet the requirements of the Categorical or Numeric field types.
- Percent - a specified amount in or for every hundred
- Geopoint – location coordinates
- Ignore - this selection tells CQ to ignore this column within the dataset.
- Auto - CQ analyzes the provided inputs and chooses the best available option.

6. The display configuration column tells CQ where the user wants certain columns displayed.

Note: not all columns will provide valuable charts or graphs; this allows the user to optimize their experience

7. Custom Fields:

ClearQuery allows users to create custom fields from preexisting fields contained within their datasets during ingest. For example, if a dataset has separate latitude and a longitude fields, the user can have the system combine them during ingest to create a geopoint field. Another example would be combining first and last name fields to create a full name field.

To do this:

Select the “Custom Fields” Tab under the Import New Dataset Step Tracker (Top of Image 8).
Assign a name to the new field (the desired name for the custom field).
Select the desired processor type (note: the new field type is listed in parenthesis below):
- Clean Number (categorical) – removes all characters that are not digits from the selected field.
- Combine Fields (categorical) – select multiple fields to be combined into a single field.
- GeoPoint (geopoint) – used to combine latitude and longitude fields, make sure to select the latitude field first so the new field is oriented appropriately.
- Lowercase (categorical) – converts all characters in the field to lowercase.
- Strip HTML (text) – removes all HTML and XTML tags from within the field.
- Split String* (categorical) – splits a string such as a .csv into an array of values.
- Replace All* (text) – within a field the system will replace the defined value with the specified replacement value.

*Supports Regex

If you want to combine another set of fields, select the “Add Field” button.
- Note: as users add additional custom fields, the previously created fields can be used to create additional new custom fields. For example, custom field 1 can be used to create custom field 2.
Click “Next” to continue the upload.
- If the user needs to continue configuring the dataset, navigate back to the “Configure” Tab

8. Once you have completed the configuration, click “Next.”

9. Set Access:

Here the user can assign permissions to the dataset:
- Owners – can analyze, edit, and delete the dataset
- Readers – analysis only

Configure Cross Dataset Search access:
- Discoverability – can this dataset be searched
  - If so, what information can the system display to users without access:
    - Label
    - Description
    - Tags

Click “Next” to proceed.

10. Now the system will import and analyze the dataset. Once the graphic reaches 100%, click “See Data”:

If the system is ingesting a large amount of data, the user can navigate away from this screen and the system will provide a progress bar showing the status of the upload.

Adding additional data (rows/records) to a dataset:

Users can manually add additional data to a dataset within CQ by:

Navigating to the Dataset Management page.
Hovering over the “Action” dropdown for the specific dataset you want to add data to.
Clicking the “Add More Data” option.
Following the .csv file upload prompts.
- Note:
  - The data schema of the new file needs to match the existing dataset schema.
  - The system will ignore duplicate entries, meaning the original data can be included in the new upload if necessary.
  - This is not a way to edit pre-existing data within the system, as existing rows will not be updated.

If you have additional questions regarding this topic, please reach out to your Account Manager or contact ClearQuery Support (support@clearquery.io).