Copyright (c) 2021, 2022 Oracle and/or its affiliates. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/

Getting Started with ocifs

The Oracle Cloud Infrastructure (OCI) Object Storage filesystem (ocifs) is an fsspec implementation for use with Object Storage.

Quickstart with Pandas

Begin by importing ocifs and pandas. When importing ocifs, you are registering the oci protocol with pandas:

[1]:
import ocifs
import pandas as pd

Now that the oci protocol is registered with pandas, you can read and write from and to Object Storage as easily as you can locally. For example, you could read an Excel file, path/file.xls, from your bucket in a namespace easily using:

[ ]:
df = pd.read_excel("oci://bucket@namespace/path/file.xls",
                   storage_options={"config": "~/.oci/config"})
[ ]:
df.to_parquet("oci://bucket@namespace/path/file.parquet",
              storage_options={"config": "~/.oci/config"})

You could also use Dask:

[ ]:
from dask import dataframe as dd

ddf = dd.read_csv("oci://bucket@namespace/path/file*.csv",
                  storage_options={"config": "~/.oci/config"})

The storage_options parameter contains a dictionary of arguments that are passed to the underlying OCIFileSystem method. The following docstring lists the valid arguments to storage options:

[5]:
ocifs.OCIFileSystem?
Init signature: ocifs.OCIFileSystem(*args, **kwargs)
Docstring:
Access oci as if it were a file system.
This exposes a filesystem-like API (ls, cp, open, etc.) on top of oci
storage.
Parameters
----------
config : Union[dict, str, None]
    Config for the connection to OCI.
    If a dict, it should be returned from oci.config.from_file
    If a str, it should be the location of the config file
    If None, user should have a Resource Principal configured environment
profile : str
    The profile to use from the config (If the config is passed in)
default_block_size: int (None)
    If given, the default block size value used for ``open()``, if no
    specific value is given at all time. The built-in default is 5MB.
config_kwargs : dict of parameters passed to the OCI Client upon connection
    This will first scan for `profile` (in the case of a passed in config)
    Or `resource_principal_token_path_provider` (in the case of no config)
    The rest will be passed to the ObjectStorageClient
    more info here: oci.object_storage.ObjectStorageClient.__init__
oci_additional_kwargs : dict of parameters that are used when calling oci api
    methods. Typically used for things like "retry_strategy" or .
kwargs : other parameters for oci session
    This includes default parameters for tenancy, namespace, and region
    Any other parameters are passed along to AbstractFileSystem's init method.
Examples
--------
>>> fs = OCIFileSystem(config=config)  # doctest: +SKIP
>>> fs.ls('my-bucket@my-namespace/')  # doctest: +SKIP
['my-file.txt']
>>> with fs.open('my-bucket@my-namespace/my-file.txt', mode='rb') as f:  # doctest: +SKIP
...     print(f.read())  # doctest: +SKIP
b'Hello, world!'
Init docstring:
Create and configure file-system instance

Instances may be cachable, so if similar enough arguments are seen
a new instance is not required. The token attribute exists to allow
implementations to cache instances if they wish.

A reasonable default should be provided if there are no arguments.

Subclasses should call this method.

Parameters
----------
use_listings_cache, listings_expiry_time, max_paths:
    passed to ``DirCache``, if the implementation supports
    directory listing caching. Pass use_listings_cache=False
    to disable such caching.
skip_instance_cache: bool
    If this is a cachable implementation, pass True here to force
    creating a new instance even if a matching instance exists, and prevent
    storing this instance.
asynchronous: bool
loop: asyncio-compatible IOLoop or None
File:           ~/ocifs/ocifs/core.py
Type:           _Cached
Subclasses:

Quickstart to UNIX Operations

You can interact with the filesytem directly using most UNIX commands like ls, cp, exists, mkdir, rm, walk, find, and so on.

Instantiate a filesystem from your configuration, see Getting Connected. Every filesystem instance operates within the home region of the configuration. The cp command is the only command that has cross-region support. You must create a unique filesystem instance for each region.

[3]:
fs = ocifs.OCIFileSystem(config="~/.oci/config", profile="DEFAULT", default_block_size=5*2**20)
[ ]:
fs.ls("oci://bucket@namespace/path")
# []
[ ]:
fs.touch("oci://bucket@namespace/path/file.txt")
[ ]:
fs.exists("oci://bucket@namespace/path/file.txt")
# True
[ ]:
fs.cat("oci://bucket@namespace/path/file.txt")
# ""
[ ]:
fs.rm("oci://bucket@namespace", recursive=True)
[ ]:
fs.exists("oci://bucket@namespace/path/file.txt")
# False

Following are examples of how you can use the OCIFileSystem and OCIFile objects.