Gcloud::Bigquery::Dataset
Dataset¶ ↑
Represents a Dataset. A dataset is a grouping mechanism that holds zero or more tables. Datasets are the lowest level unit of access control; you cannot control access at the table level. A dataset is contained within a specific project.
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery dataset = bigquery.create_dataset "my_dataset", name: "My Dataset" description: "This is my Dataset"
Attributes Methods
Public Instance Methods
dataset_id()
¶
↑
A unique ID for this dataset, without the project name. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters.
default_expiration=(new_default_expiration)
¶
↑
Updates the default lifetime of all tables in the dataset, in milliseconds.
Data Methods
Public Instance Methods
query(query, options = {})
¶
↑
Queries data using the synchronous method.
Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.
Parameters¶ ↑
query
-
A query string, following the BigQuery query syntax, of the query to execute. Example: “SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]”. (
String
) options[:max]
-
The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies. (
Integer
) options[:timeout]
-
How long to wait for the query to complete, in milliseconds, before the request times out and returns. Note that this is only a timeout for the request, not the query. If the query takes longer to run than the timeout value, the call returns without any results and with Gcloud::Bigquery::QueryData#complete? set to false. The default value is 10000 milliseconds (10 seconds). (
Integer
) options[:dryrun]
-
If set to
true
, BigQuery doesn't run the job. Instead, if the query is valid, BigQuery returns statistics about the job such as how many bytes would be processed. If the query is invalid, an error returns. The default value isfalse
. (Boolean
) options[:cache]
-
Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is true. For more information, see query caching. (
Boolean
)
Returns¶ ↑
Example¶ ↑
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery data = bigquery.query "SELECT name FROM my_table" data.each do |row| puts row["name"] end
query_job(query, options = {})
¶
↑
Queries data using the asynchronous method.
Sets the current dataset as the default dataset in the query. Useful for using unqualified table names.
Parameters¶ ↑
query
-
A query string, following the BigQuery query syntax, of the query to execute. Example: “SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]”. (
String
) options[:priority]
-
Specifies a priority for the query. Possible values include
INTERACTIVE
andBATCH
. The default value isINTERACTIVE
. (String
) options[:cache]
-
Whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. The default value is
true
. (Boolean
) options[:table]
-
The destination table where the query results should be stored. If not present, a new table will be created to store the results. (
Table
) options[:create]
-
Specifies whether the job is allowed to create new tables. (
String
)The following values are supported:
-
needed
- Create the table if it does not exist. -
never
- The table must already exist. A 'notFound' error is raised if the table does not exist.
-
options[:write]
-
Specifies the action that occurs if the destination table already exists. (
String
)The following values are supported:
-
truncate
- BigQuery overwrites the table data. -
append
- BigQuery appends the data to the table. -
empty
- A 'duplicate' error is returned in the job result if the table exists and contains data.
-
options[:large_results]
-
If
true
, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requiresoptions[:table]
to be set. (Boolean
) options[:flatten]
-
Flattens all nested and repeated fields in the query results. The default value is
true
.options[:large_results]
must betrue
if this is set tofalse
. (Boolean
)
Returns¶ ↑
Example¶ ↑
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery job = bigquery.query_job "SELECT name FROM my_table" loop do break if job.done? sleep 1 job.refresh! end if !job.failed? job.query_results.each do |row| puts row["name"] end end
Lifecycle Methods
Public Instance Methods
delete(options = {})
¶
↑
Permanently deletes the dataset. The dataset must be empty before it can be
deleted unless the force
option is set to true
.
Parameters¶ ↑
options
-
An optional Hash for controlling additional behavior. (
Hash
) options[:force]
-
If
true
, delete all the tables in the dataset. Iffalse
and the dataset contains tables, the request will fail. Default isfalse
. (Boolean
)
Returns¶ ↑
true
if the dataset was deleted.
Example¶ ↑
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery dataset = bigquery.dataset "my_dataset" dataset.delete
Table Methods
Public Instance Methods
create_table(table_id, options = {})
¶
↑
Creates a new table.
Parameters¶ ↑
table_id
-
The ID of the table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters. (
String
) options
-
An optional Hash for controlling additional behavior. (
Hash
) options[:name]
-
A descriptive name for the table. (
String
) options[:description]
-
A user-friendly description of the table. (
String
) options[:schema]
-
A schema specifying fields and data types for the table. See the Tables resource for more information. (
Hash
)
Returns¶ ↑
Examples¶ ↑
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery dataset = bigquery.dataset "my_dataset" table = dataset.create_table "my_table"
A name and description can be provided:
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery dataset = bigquery.dataset "my_dataset" schema = { "fields" => [ { "name" => "first_name", "type" => "STRING", "mode" => "REQUIRED" }, { "name" => "cities_lived", "type" => "RECORD", "mode" => "REPEATED", "fields" => [ { "name" => "place", "type" => "STRING", "mode" => "REQUIRED" }, { "name" => "number_of_years", "type" => "INTEGER", "mode" => "REQUIRED" } ] } ] } table = dataset.create_table "my_table", name: "My Table", schema: schema
create_view(table_id, query, options = {})
¶
↑
Creates a new view table from the given query.
Parameters¶ ↑
table_id
-
The ID of the view table. The ID must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_). The maximum length is 1,024 characters. (
String
) query
-
The query that BigQuery executes when the view is referenced. (
String
) options
-
An optional Hash for controlling additional behavior. (
Hash
) options[:name]
-
A descriptive name for the table. (
String
) options[:description]
-
A user-friendly description of the table. (
String
)
Returns¶ ↑
Examples¶ ↑
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery dataset = bigquery.dataset "my_dataset" view = dataset.create_view "my_view", "SELECT name, age FROM [proj:dataset.users]"
A name and description can be provided:
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery dataset = bigquery.dataset "my_dataset" view = dataset.create_view "my_view", "SELECT name, age FROM [proj:dataset.users]", name: "My View", description: "This is my view"
table(table_id)
¶
↑
Retrieves an existing table by ID.
Parameters¶ ↑
table_id
-
The ID of a table. (
String
)
Returns¶ ↑
Gcloud::Bigquery::Table or Gcloud::Bigquery::View or nil if the table does not exist
Example¶ ↑
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery dataset = bigquery.dataset "my_dataset" table = dataset.table "my_table" puts table.name
tables(options = {})
¶
↑
Retrieves the list of tables belonging to the dataset.
Parameters¶ ↑
options
-
An optional Hash for controlling additional behavior. (
Hash
) options[:token]
-
A previously-returned page token representing part of the larger set of results to view. (
String
) options[:max]
-
Maximum number of tables to return. (
Integer
)
Returns¶ ↑
Array of Gcloud::Bigquery::Table or Gcloud::Bigquery::View (Gcloud::Bigquery::Table::List)
Examples¶ ↑
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery dataset = bigquery.dataset "my_dataset" tables = dataset.tables tables.each do |table| puts table.name end
If you have a significant number of tables, you may need to paginate through them: (See Gcloud::Bigquery::Dataset::List#token)
require "gcloud" gcloud = Gcloud.new bigquery = gcloud.bigquery dataset = bigquery.dataset "my_dataset" all_tables = [] tmp_tables = dataset.tables while tmp_tables.any? do tmp_tables.each do |table| all_tables << table end # break loop if no more tables available break if tmp_tables.token.nil? # get the next group of tables tmp_tables = dataset.tables token: tmp_tables.token end