Analytics Environment Resources
In the Analytics Environment, there are two types of data storage resources available:
Cloud data warehouse databases (BigQuery datasets)
Cloud object storage buckets (GCS buckets)
BigQuery Datasets
Your BigQuery instance comes with predefined datasets (within the Resources area on the left) that can be used to store different types of data, depending on your user persona:
Note
In your BigQuery instance, the names for these datasets start with your account name and end with the suffixes shown below (for example, "<organization_name
>_wh").
Suffix | Access Type | Description | Data Scientists | Data Analysts |
---|---|---|---|---|
_ai | Read, write | Holds input and output tables used in machine learning (ML) models (analytics insights). | Yes | No |
_pub | Read, write | Holds tables that need to be published to Tableau or marked for granting permissions. | Yes | No |
_rdm | Read-only | For tenants with retail data, holds sales transaction data (retail data model). | Yes | No |
_rp | Read-only | Holds tables to use for creating Tableau reports. | No | Yes |
_sg | Read, write | Holds segment tables to send to Customer Profiles. | Yes | No |
_wh | Read-only | Holds warehouse tables that store your ingested data | Yes | No |
_work | Read, write | Holds tables as a workspace for use by your organization (not shared with other tenants). By default, there is no expiration policy for tables in this dataset. | Yes | No |
Customer Profiles Segment Data
The following tables in the _wh (warehouse) dataset store your ingested data:
customer_profiles_segments: Contains the segment data (members) that were sent to Analytics Environment from Customer Profiles. Use this table to perform campaign measurements and develop insights.
customer_profiles_segment_reference: Contains metadata from distributed segments that are automatically sent to Analytics Environment, including name, counts, destinations distributed, and timestamps.
customer_profiles_segments table:
Column name | Data Type | Description |
---|---|---|
distributed_timestamp | Timestamp | The distribution date, which depends on the mode:
For information about these modes, see "Distribute to Analytics Environment." |
segment_name | String | The name of the segment |
test_flag | Boolean | Indicates whether a RampID is in the segment's test group (True) or its control group (False) |
user_id | String | The RampID associated with the record in the segment |
customer_profiles_segment_reference table:
Column name | Data Type | Description |
---|---|---|
alwayson_enabled | Boolean |
|
append_mode | Integer |
|
count | Integer | The count of the segment RampIDs |
destination_name | String | The destination name. If there are multiple destinations in the distribution, multiple records are created. |
distributed_timestamp | Timestamp | The distribution date |
segment_id | String | The unique ID of the segment |
segment_name | String | The name of the segment |
GCS Buckets
In addition to BigQuery datasets, data scientists can use the following GCS buckets for storing data and code:
_CODEREPO (read,write): Store code artifacts for submitting a non-interactive job to the Analytics Environment Spark cluster.
_WORK (read,write): Store files processed by the Spark cluster (a sandbox).