Copyright (c) 2021, 2022 Oracle and/or its affiliates. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/
Getting Started with ocifs
The Oracle Cloud Infrastructure (OCI) Object Storage filesystem (ocifs) is an fsspec implementation for use with Object Storage.
Quickstart with Pandas
Begin by importing ocifs
and pandas
. When importing ocifs
, you are registering the oci
protocol with pandas
:
[1]:
import ocifs
import pandas as pd
Now that the oci
protocol is registered with pandas
, you can read and write from and to Object Storage as easily as you can locally. For example, you could read an Excel file, path/file.xls
, from your bucket in a namespace easily using:
[ ]:
df = pd.read_excel("oci://bucket@namespace/path/file.xls",
storage_options={"config": "~/.oci/config"})
[ ]:
df.to_parquet("oci://bucket@namespace/path/file.parquet",
storage_options={"config": "~/.oci/config"})
You could also use Dask:
[ ]:
from dask import dataframe as dd
ddf = dd.read_csv("oci://bucket@namespace/path/file*.csv",
storage_options={"config": "~/.oci/config"})
The storage_options
parameter contains a dictionary of arguments that are passed to the underlying OCIFileSystem
method. The following docstring
lists the valid arguments to storage options:
[5]:
ocifs.OCIFileSystem?
Init signature: ocifs.OCIFileSystem(*args, **kwargs)
Docstring:
Access oci as if it were a file system.
This exposes a filesystem-like API (ls, cp, open, etc.) on top of oci
storage.
Parameters
----------
config : Union[dict, str, None]
Config for the connection to OCI.
If a dict, it should be returned from oci.config.from_file
If a str, it should be the location of the config file
If None, user should have a Resource Principal configured environment
profile : str
The profile to use from the config (If the config is passed in)
default_block_size: int (None)
If given, the default block size value used for ``open()``, if no
specific value is given at all time. The built-in default is 5MB.
config_kwargs : dict of parameters passed to the OCI Client upon connection
This will first scan for `profile` (in the case of a passed in config)
Or `resource_principal_token_path_provider` (in the case of no config)
The rest will be passed to the ObjectStorageClient
more info here: oci.object_storage.ObjectStorageClient.__init__
oci_additional_kwargs : dict of parameters that are used when calling oci api
methods. Typically used for things like "retry_strategy" or .
kwargs : other parameters for oci session
This includes default parameters for tenancy, namespace, and region
Any other parameters are passed along to AbstractFileSystem's init method.
Examples
--------
>>> fs = OCIFileSystem(config=config) # doctest: +SKIP
>>> fs.ls('my-bucket@my-namespace/') # doctest: +SKIP
['my-file.txt']
>>> with fs.open('my-bucket@my-namespace/my-file.txt', mode='rb') as f: # doctest: +SKIP
... print(f.read()) # doctest: +SKIP
b'Hello, world!'
Init docstring:
Create and configure file-system instance
Instances may be cachable, so if similar enough arguments are seen
a new instance is not required. The token attribute exists to allow
implementations to cache instances if they wish.
A reasonable default should be provided if there are no arguments.
Subclasses should call this method.
Parameters
----------
use_listings_cache, listings_expiry_time, max_paths:
passed to ``DirCache``, if the implementation supports
directory listing caching. Pass use_listings_cache=False
to disable such caching.
skip_instance_cache: bool
If this is a cachable implementation, pass True here to force
creating a new instance even if a matching instance exists, and prevent
storing this instance.
asynchronous: bool
loop: asyncio-compatible IOLoop or None
File: ~/ocifs/ocifs/core.py
Type: _Cached
Subclasses:
Quickstart to UNIX Operations
You can interact with the filesytem directly using most UNIX commands like ls
, cp
, exists
, mkdir
, rm
, walk
, find
, and so on.
Instantiate a filesystem from your configuration, see Getting Connected. Every filesystem instance operates within the home region of the configuration. The cp
command is the only command that has cross-region support. You must create a unique filesystem instance for each region.
[3]:
fs = ocifs.OCIFileSystem(config="~/.oci/config", profile="DEFAULT", default_block_size=5*2**20)
[ ]:
fs.ls("oci://bucket@namespace/path")
# []
[ ]:
fs.touch("oci://bucket@namespace/path/file.txt")
[ ]:
fs.exists("oci://bucket@namespace/path/file.txt")
# True
[ ]:
fs.cat("oci://bucket@namespace/path/file.txt")
# ""
[ ]:
fs.rm("oci://bucket@namespace", recursive=True)
[ ]:
fs.exists("oci://bucket@namespace/path/file.txt")
# False
Following are examples of how you can use the OCIFileSystem
and OCIFile
objects.