Applications Available
The Analytics Environment virtual machine desktop provides the following applications for performing a variety of tasks with your Safe Haven data:
Google Cloud Platform (GCP) console, which provides access to the following applications:
BigQuery: A data warehouse that enables fast SQL queries
The following applications rely on BigQuery data processing engines for running code:
BigQuery engine (a fully-managed, big data SQL engine) when using BigQuery console or BigQuery Python API
Dataproc Spark cluster (managed Hadoop servers) when using PySpark or Python
Dataproc Console: Uses Spark Jobs Submit to submit PySpark and Python jobs to a distributed Hadoop cluster
Other GCP applications are locked down and cannot be used in your Safe Haven instance.
Job Management: Features that enable you to create, schedule, and manage PySpark and Python jobs and receive email notifications about their status.
Jupyter: Allows you to manipulate your Safe Haven data interactively using Python or PySpark. You can use JupyterLab's GitLab extension to provide version control for your code in JupyterLab.
Users with the LSH Data Analyst persona can access Tableau Desktop to build reports and visualizations.
Users with the LSH Admin persona or the LSH Data Scientist persona lack access to Tableau Desktop.
Tableau Server: Users who are granted the LSH Data Analyst w publishing persona can edit and publish Tableau reports. Users with the following personas can edit reports but cannot save them:
LSH Admin
LSH Campaign Planner w report editing
LSH Data Analyst w/o publishing
LSH Data Scientist
Interactive vs. Non-interactive
Some applications allow you to run code in the following ways:
Interactive: You can run the code for each step of the overall process independently and see the results for that step.
Non-interactive: The process runs the entire set of code all at once and you wait for all of the code to run before seeing the results. Non-interactive processing is recommended if your code requires advanced scripting or library use.
Tools for querying and analyzing data in an interactive way:
BigQuery (from within the GCP Console)
Jupyter Notebooks:
Tools for submitting non-interactive jobs:
Dataproc jobs submit (from within the GCP Console)