Python

0.4.0

Storage#

Cloud Storage in 10 seconds#

Install the library#

The source code for the library (and demo code) lives on GitHub, You can install the library quickly with pip:

$ pip install gcloud

Run the demo#

In order to run the demo, you need to have registred an actual gcloud project and so you’ll need to provide some environment variables to facilitate authentication to your project:

  • GCLOUD_TESTS_PROJECT_ID: Developers Console project ID (e.g. bamboo-shift-455).
  • GCLOUD_TESTS_DATASET_ID: The name of the dataset your tests connect to. This is typically the same as GCLOUD_TESTS_PROJECT_ID.
  • GOOGLE_APPLICATION_CREDENTIALS: The path to a JSON key file; see regression/app_credentials.json.sample as an example. Such a file can be downloaded directly from the developer’s console by clicking “Generate new JSON key”. See private key docs for more details.

Run the example script included in the package:

$ python -m gcloud.storage.demo

And that’s it! You should be walking through a demonstration of using gcloud.storage to read and write data to Google Cloud Storage.

Try it yourself#

You can interact with a demo dataset in a Python interactive shell.

Start by importing the demo module and instantiating the demo connection:

>>> from gcloud.storage import demo
>>> connection = demo.get_connection()

Once you have the connection, you can create buckets and blobs:

>>> connection.get_all_buckets()
[<Bucket: ...>, ...]
>>> bucket = connection.create_bucket('my-new-bucket')
>>> print bucket
<Bucket: my-new-bucket>
>>> blob = bucket.new_blob('my-test-file.txt')
>>> print blob
<Blob: my-new-bucket, my-test-file.txt>
>>> blob = blob.upload_from_string('this is test content!')
>>> print blob.download_as_string()
'this is test content!'
>>> print bucket.get_all_blobs()
[<Blob: my-new-bucket, my-test-file.txt>]
>>> blob.delete()
>>> bucket.delete()

Note

The get_connection method is just a shortcut for:

>>> from gcloud import storage
>>> from gcloud.storage import demo
>>> connection = storage.get_connection(demo.PROJECT_ID)

gcloud.storage#

Shortcut methods for getting set up with Google Cloud Storage.

You’ll typically use these to get started with the API:

>>> import gcloud.storage
>>> bucket = gcloud.storage.get_bucket('bucket-id-here', 'project-id')
>>> # Then do other things...
>>> blob = bucket.get_blob('/remote/path/to/file.txt')
>>> print blob.download_as_string()
>>> blob.upload_from_string('New contents!')
>>> bucket.upload_file('/remote/path/storage.txt', '/local/path.txt')

The main concepts with this API are:

gcloud.storage.__init__.get_bucket(bucket_name, project)[source]#

Shortcut method to establish a connection to a particular bucket.

You’ll generally use this as the first call to working with the API:

>>> from gcloud import storage
>>> bucket = storage.get_bucket(project, bucket_name)
>>> # Now you can do things with the bucket.
>>> bucket.exists('/path/to/file.txt')
False
Parameters:
  • bucket_name (string) – The id of the bucket you want to use. This is akin to a disk name on a file system.
  • project (string) – The name of the project to connect to.
Return type:

gcloud.storage.bucket.Bucket

Returns:

A bucket with a connection using the provided credentials.

gcloud.storage.__init__.get_connection(project)[source]#

Shortcut method to establish a connection to Cloud Storage.

Use this if you are going to access several buckets with the same set of credentials:

>>> from gcloud import storage
>>> connection = storage.get_connection(project)
>>> bucket1 = connection.get_bucket('bucket1')
>>> bucket2 = connection.get_bucket('bucket2')
Parameters:project (string) – The name of the project to connect to.
Return type:gcloud.storage.connection.Connection
Returns:A connection defined with the proper credentials.
gcloud.storage.__init__.set_default_bucket(bucket=None)[source]#

Set default bucket either explicitly or implicitly as fall-back.

In implicit case, currently only supports enviroment variable but will support App Engine, Compute Engine and other environments in the future.

In the implicit case, relies on an implicit connection in addition to the implicit bucket name.

Local environment variable used is: - GCLOUD_BUCKET_NAME

Parameters:bucket (gcloud.storage.bucket.Bucket) – Optional. The bucket to use as default.
gcloud.storage.__init__.set_default_connection(project=None, connection=None)[source]#

Set default connection either explicitly or implicitly as fall-back.

Parameters:
gcloud.storage.__init__.set_default_project(project=None)[source]#

Set default bucket name either explicitly or implicitly as fall-back.

In implicit case, currently only supports enviroment variable but will support App Engine, Compute Engine and other environments in the future.

Local environment variable used is: - GCLOUD_PROJECT

Parameters:project (string) – Optional. The project name to use as default.
gcloud.storage.__init__.set_defaults(bucket=None, project=None, connection=None)[source]#

Set defaults either explicitly or implicitly as fall-back.

Uses the arguments to call the individual default methods.

Parameters:

Connections#

Create / interact with gcloud storage connections.

class gcloud.storage.connection.Connection(project, *args, **kwargs)[source]#

Bases: gcloud.connection.Connection

A connection to Google Cloud Storage via the JSON REST API.

This class should understand only the basic types (and protobufs) in method arguments, however should be capable of returning advanced types.

See gcloud.connection.Connection for a full list of parameters. Connection differs only in needing a project name (which you specify when creating a project in the Cloud Console).

A typical use of this is to operate on gcloud.storage.bucket.Bucket objects:

>>> from gcloud import storage
>>> connection = storage.get_connection(project)
>>> bucket = connection.create_bucket('my-bucket-name')

You can then delete this bucket:

>>> bucket.delete()
>>> # or
>>> connection.delete_bucket(bucket)

If you want to access an existing bucket:

>>> bucket = connection.get_bucket('my-bucket-name')

A Connection is actually iterable and will return the gcloud.storage.bucket.Bucket objects inside the project:

>>> for bucket in connection:
>>>   print bucket
<Bucket: my-bucket-name>

In that same way, you can check for whether a bucket exists inside the project using Python’s in operator:

>>> print 'my-bucket-name' in connection
True
Parameters:project (string) – The project name to connect to.
API_URL_TEMPLATE = '{api_base_url}/storage/{api_version}{path}'#

A template for the URL of a particular API call.

API_VERSION = 'v1'#

The version of the API, used in building the API call’s URL.

api_request(method, path, query_params=None, data=None, content_type=None, api_base_url=None, api_version=None, expect_json=True)[source]#

Make a request over the HTTP transport to the Cloud Storage API.

You shouldn’t need to use this method, but if you plan to interact with the API using these primitives, this is the correct one to use...

Parameters:
  • method (string) – The HTTP method name (ie, GET, POST, etc). Required.
  • path (string) – The path to the resource (ie, '/b/bucket-name'). Required.
  • query_params (dict) – A dictionary of keys and values to insert into the query string of the URL. Default is empty dict.
  • data (string) – The data to send as the body of the request. Default is the empty string.
  • content_type (string) – The proper MIME type of the data provided. Default is None.
  • api_base_url (string) – The base URL for the API endpoint. Typically you won’t have to provide this. Default is the standard API base URL.
  • api_version (string) – The version of the API to call. Typically you shouldn’t provide this and instead use the default for the library. Default is the latest API version supported by gcloud-python.
  • expect_json (boolean) – If True, this method will try to parse the response as JSON and raise an exception if that cannot be done. Default is True.
Raises:

Exception if the response code is not 200 OK.

build_api_url(path, query_params=None, api_base_url=None, api_version=None, upload=False)[source]#

Construct an API url given a few components, some optional.

Typically, you shouldn’t need to use this method.

Parameters:
  • path (string) – The path to the resource (ie, '/b/bucket-name').
  • query_params (dict) – A dictionary of keys and values to insert into the query string of the URL.
  • api_base_url (string) – The base URL for the API endpoint. Typically you won’t have to provide this.
  • api_version (string) – The version of the API to call. Typically you shouldn’t provide this and instead use the default for the library.
  • upload (boolean) – True if the URL is for uploading purposes.
Return type:

string

Returns:

The URL assembled from the pieces provided.

create_bucket(bucket_name)[source]#

Create a new bucket.

For example:

>>> from gcloud import storage
>>> connection = storage.get_connection(project)
>>> bucket = connection.create_bucket('my-bucket')
>>> print bucket
<Bucket: my-bucket>
Parameters:bucket_name (string) – The bucket name to create.
Return type:gcloud.storage.bucket.Bucket
Returns:The newly created bucket.
Raises:gcloud.exceptions.Conflict if there is a confict (bucket already exists, invalid name, etc.)
delete_bucket(bucket_name)[source]#

Delete a bucket.

You can use this method to delete a bucket by name.

>>> from gcloud import storage
>>> connection = storage.get_connection(project)
>>> connection.delete_bucket('my-bucket')

If the bucket doesn’t exist, this will raise a gcloud.exceptions.NotFound:

>>> from gcloud.exceptions import NotFound
>>> try:
>>>   connection.delete_bucket('my-bucket')
>>> except NotFound:
>>>   print 'That bucket does not exist!'

If the bucket still has objects in it, this will raise a gcloud.exceptions.Conflict:

>>> from gcloud.exceptions import Conflict
>>> try:
>>>   connection.delete_bucket('my-bucket')
>>> except Conflict:
>>>   print 'That bucket is not empty!'
Parameters:bucket_name (string) – The bucket name to delete.
get_all_buckets()[source]#

Get all buckets in the project.

This will not populate the list of blobs available in each bucket.

You can also iterate over the connection object, so these two operations are identical:

>>> from gcloud import storage
>>> connection = storage.get_connection(project)
>>> for bucket in connection.get_all_buckets():
>>>   print bucket
>>> # ... is the same as ...
>>> for bucket in connection:
>>>   print bucket
Return type:list of gcloud.storage.bucket.Bucket objects.
Returns:All buckets belonging to this project.
get_bucket(bucket_name)[source]#

Get a bucket by name.

If the bucket isn’t found, this will raise a gcloud.storage.exceptions.NotFound.

For example:

>>> from gcloud import storage
>>> from gcloud.exceptions import NotFound
>>> connection = storage.get_connection(project)
>>> try:
>>>   bucket = connection.get_bucket('my-bucket')
>>> except NotFound:
>>>   print 'Sorry, that bucket does not exist!'
Parameters:bucket_name (string) – The name of the bucket to get.
Return type:gcloud.storage.bucket.Bucket
Returns:The bucket matching the name provided.
Raises:gcloud.exceptions.NotFound
make_request(method, url, data=None, content_type=None, headers=None)[source]#

A low level method to send a request to the API.

Typically, you shouldn’t need to use this method.

Parameters:
  • method (string) – The HTTP method to use in the request.
  • url (string) – The URL to send the request to.
  • data (string) – The data to send as the body of the request.
  • content_type (string) – The proper MIME type of the data provided.
  • headers (dict) – A dictionary of HTTP headers to send with the request.
Return type:

tuple of response (a dictionary of sorts) and content (a string).

Returns:

The HTTP response object and the content of the response.

Iterators#

Iterators for paging through API responses.

These iterators simplify the process of paging through API responses where the response is a list of results with a nextPageToken.

To make an iterator work, just override the get_items_from_response method so that given a response (containing a page of results) it parses those results into an iterable of the actual objects you want:

class MyIterator(Iterator):
  def get_items_from_response(self, response):
    items = response.get('items', [])
    for item in items:
      yield MyItemClass(properties=item, other_arg=True)

You then can use this to get all the results from a resource:

>>> iterator = MyIterator(...)
>>> list(iterator)  # Convert to a list (consumes all values).

Or you can walk your way through items and call off the search early if you find what you’re looking for (resulting in possibly fewer requests):

>>> for item in MyIterator(...):
>>>   print item.name
>>>   if not item.is_valid:
>>>     break
class gcloud.storage.iterator.Iterator(connection, path, extra_params=None)[source]#

Bases: object

A generic class for iterating through Cloud Storage list responses.

Parameters:
PAGE_TOKEN = 'pageToken'#
RESERVED_PARAMS = frozenset(['pageToken'])#
get_items_from_response(response)[source]#

Factory method called while iterating. This should be overriden.

This method should be overridden by a subclass. It should accept the API response of a request for the next page of items, and return a list (or other iterable) of items.

Typically this method will construct a Bucket or a Blob from the page of results in the response.

Parameters:response (dict) – The response of asking for the next page of items.
Return type:iterable
Returns:Items that the iterator should yield.
get_next_page_response()[source]#

Requests the next page from the path provided.

Return type:dict
Returns:The parsed JSON response of the next page’s contents.
get_query_params()[source]#

Getter for query parameters for the next request.

Return type:dict
Returns:A dictionary of query parameters.
has_next_page()[source]#

Determines whether or not this iterator has more pages.

Return type:boolean
Returns:Whether the iterator has more pages or not.
reset()[source]#

Resets the iterator to the beginning.