Gcloud::Bigquery::Project
Project¶ ↑
Projects are top-level containers in Google Cloud Platform. They store information about billing and authorized users, and they contain BigQuery data. Each project has a friendly name and a unique ID.
Gcloud::Bigquery::Project is the main object for interacting with Google BigQuery. Gcloud::Bigquery::Dataset objects are created, accessed, and deleted by Gcloud::Bigquery::Project.
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery dataset = bigquery.dataset "my_dataset" table = dataset.table "my_table"
See Gcloud#bigquery
Methods
Public Class Methods
Public Instance Methods
create_dataset(dataset_id, options = {})
¶
↑
Creates a new dataset.
Parameters¶ ↑
dataset_id
-
A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters. (
String
) options
-
An optional Hash for controlling additional behavior. (
Hash
) options[:name]
-
A descriptive name for the dataset. (
String
) options[:description]
-
A user-friendly description of the dataset. (
String
) options[:expiration]
-
The default lifetime of all tables in the dataset, in milliseconds. The minimum value is 3600000 milliseconds (one hour). (
Integer
)
Returns¶ ↑
Examples¶ ↑
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery dataset = bigquery.create_dataset "my_dataset"
A name and description can be provided:
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery dataset = bigquery.create_dataset "my_dataset", name: "My Dataset", description: "This is my Dataset"
datasets(options = {})
¶
↑
Retrieves the list of datasets belonging to the project.
Parameters¶ ↑
options
-
An optional Hash for controlling additional behavior. (
Hash
) options[:all]
-
Whether to list all datasets, including hidden ones. The default is
false
. (Boolean
) options[:token]
-
A previously-returned page token representing part of the larger set of results to view. (
String
) options[:max]
-
Maximum number of datasets to return. (
Integer
)
Returns¶ ↑
Array of Gcloud::Bigquery::Dataset (Gcloud::Bigquery::Dataset::List)
Examples¶ ↑
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery datasets = bigquery.datasets datasets.each do |dataset| puts dataset.name end
You can also retrieve all datasets, including hidden ones, by providing the
:all
option:
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery all_datasets = bigquery.datasets, all: true
If you have a significant number of datasets, you may need to paginate through them: (See Gcloud::Bigquery::Dataset::List#token)
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery all_datasets = [] tmp_datasets = bigquery.datasets while tmp_datasets.any? do tmp_datasets.each do |dataset| all_datasets << dataset end # break loop if no more datasets available break if tmp_datasets.token.nil? # get the next group of datasets tmp_datasets = bigquery.datasets token: tmp_datasets.token end
jobs(options = {})
¶
↑
Retrieves the list of jobs belonging to the project.
Parameters¶ ↑
options
-
An optional Hash for controlling additional behavior. (
Hash
) options[:all]
-
Whether to display jobs owned by all users in the project. The default is
false
. (Boolean
) options[:token]
-
A previously-returned page token representing part of the larger set of results to view. (
String
) options[:max]
-
Maximum number of jobs to return. (
Integer
) options[:filter]
-
A filter for job state. (
String
)Acceptable values are:
-
done
- Finished jobs -
pending
- Pending jobs -
running
- Running jobs
-
Returns¶ ↑
Array of Gcloud::Bigquery::Job (Gcloud::Bigquery::Job::List)
Examples¶ ↑
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery jobs = bigquery.jobs
You can also retrieve only running jobs using the :filter
option:
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery running_jobs = bigquery.jobs filter: "running"
If you have a significant number of jobs, you may need to paginate through them: (See Gcloud::Bigquery::Job::List#token)
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery all_jobs = [] tmp_jobs = bigquery.jobs while tmp_jobs.any? do tmp_jobs.each do |job| all_jobs << job end # break loop if no more jobs available break if tmp_jobs.token.nil? # get the next group of jobs tmp_jobs = bigquery.jobs token: tmp_jobs.token end
query(query, options = {})
¶
↑
Queries data using the synchronous method.
Parameters¶ ↑
query
-
A query string, following the BigQuery query syntax, of the query to execute. Example: “SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]”. (
String
) options[:max]
-
The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies. (
Integer
) options[:timeout]
-
How long to wait for the query to complete, in milliseconds, before the request times out and returns. Note that this is only a timeout for the request, not the query. If the query takes longer to run than the timeout value, the call returns without any results and with Gcloud::Bigquery::QueryData#complete? set to false. The default value is 10000 milliseconds (10 seconds). (
Integer
) options[:dryrun]
-
If set to
true
, BigQuery doesn't run the job. Instead, if the query is valid, BigQuery returns statistics about the job such as how many bytes would be processed. If the query is invalid, an error returns. The default value isfalse
. (Boolean
) options[:cache]
-
Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching. (
Boolean
) options[:dataset]
-
Specifies the default datasetId and projectId to assume for any unqualified table names in the query. If not set, all table names in the query string must be qualified in the format 'datasetId.tableId'. (
String
) options[:project]
-
Specifies the default projectId to assume for any unqualified table names in the query. Only used if
dataset
option is set. (String
)
Returns¶ ↑
Example¶ ↑
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery data = bigquery.query "SELECT name FROM [my_proj:my_data.my_table]" data.each do |row| puts row["name"] end
query_job(query, options = {})
¶
↑
Queries data using the asynchronous method.
Parameters¶ ↑
query
-
A query string, following the BigQuery query syntax, of the query to execute. Example: “SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]”. (
String
) options[:priority]
-
Specifies a priority for the query. Possible values include
INTERACTIVE
andBATCH
. The default value isINTERACTIVE
. (String
) options[:cache]
-
Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is
true
. (Boolean
) options[:table]
-
The destination table where the query results should be stored. If not present, a new table will be created to store the results. (
Table
) options[:create]
-
Specifies whether the job is allowed to create new tables. (
String
)The following values are supported:
-
needed
- Create the table if it does not exist. -
never
- The table must already exist. A 'notFound' error is raised if the table does not exist.
-
options[:write]
-
Specifies the action that occurs if the destination table already exists. (
String
)The following values are supported:
-
truncate
- BigQuery overwrites the table data. -
append
- BigQuery appends the data to the table. -
empty
- A 'duplicate' error is returned in the job result if the table exists and contains data.
-
options[:large_results]
-
If
true
, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requiresoptions[:table]
to be set. (Boolean
) options[:flatten]
-
Flattens all nested and repeated fields in the query results. The default value is
true
.options[:large_results]
must betrue
if this is set tofalse
. (Boolean
) options[:dataset]
-
Specifies the default dataset to use for unqualified table names in the query. (
Dataset
orString
)
Returns¶ ↑
Example¶ ↑
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery job = bigquery.query_job "SELECT name FROM [my_proj:my_data.my_table]" loop do break if job.done? sleep 1 job.refresh! end if !job.failed? job.query_results.each do |row| puts row["name"] end end