Unix Operations

Important: The ocifs SDK isn’t a one-to-one adaptor of OCI Object Storage and UNIX filesystem operations. It’s a set of convenient wrappings to assist Pandas in natively reading from Object Storage. It supports many of the common UNIX functions, and many of the Object Storage API though not all.

Following are examples of some of the most popular filesystem and file methods. First, you must instantiate your region-specific filesystem instance:

[1]:
from ocifs import OCIFileSystem

fs = OCIFileSystem(config="~/.oci/config")

Filesystem Operations

list

List the files in a bucket or subdirectory using ls:

[ ]:
fs.ls("bucket@namespace/")
# ['bucket@namespace/file.txt',
#  'bucket@namespace/data.csv',
#  'bucket@namespace/folder1/',
#  'bucket@namespace/folder2/']

list has the following args: 1) compartment_id: a specific compartment from which to list. 2)detail: If true, return a list of dictionaries with various details about each object. 3)refresh: If true, ignore the cache and pull fresh.

[ ]:
fs.ls("bucket@namespace/", detail=True)
# [{'name': 'bucket@namespace/file.txt',
#   'etag': 'abcdefghijklmnop',
#   'type': 'file',
#   'timeCreated': <timestamp when artifact created>,
#  ... },
#  ...
# ]

touch

The UNIX touch command creates empty files in Object Storage. The data parameter accepts a bytestream and writes it to the new file.

[ ]:
fs.touch("bucket@namespace/newfile", data=b"Hello World!")
[ ]:
fs.cat("bucket@namespace/newfile")
# "Hello World!"

copy

The copy method is a popular UNIX method, and it has a special role in ocifs as the only method capable of cross-tenancy calls. Your IAM Policy must permit you to read and write cross-region to use the copy method cross-region. Note: Another benefit of copy is that it can move large data between locations in Object Storage without needing to store anything locally.

[ ]:
fs.copy("bucket@namespace/newfile", "bucket@namespace/newfile-sydney",
        destination_region="ap-sydney-1")

rm

The rm method is another essential UNIX filesystem method. It accepts one additional argument (beyond the path), recursive. When recursive=True, it is equivalent to an rm -rf command. It deletes all files underneath the prefix.

[ ]:
fs.exists("oci://bucket@namespace/folder/file")
# True
[ ]:
fs.rm("oci://bucket@namespace/folder", recursive=True)
[ ]:
fs.exists("oci://bucket@namespace/folder/file")
# False

glob

Fsspec implementations, including ocifs, support UNIX glob patterns, see Globbing.

[ ]:
fs.glob("oci://bucket@namespace/folder/*.csv")
# ["bucket@namespace/folder/part1.csv", "bucket@namespace/folder/part2.csv"]

Dask has special support for reading from and writing to a set of files using glob expressions (Pandas doesn’t support glob), see Dask’s Glob support.

[ ]:
from dask import dataframe as dd

ddf = dd.read_csv("oci://bucket@namespace/folder/*.csv")
ddf.to_csv("oci://bucket@namespace/folder_copy/*.csv")

walk

Use the UNIX walk method for iterating through the subdirectories of a given path. This is a valuable method for determining every file within a bucket or folder.

[ ]:
fs.walk("oci://bucket@namespace/folder")
# ["bucket@namespace/folder/part1.csv", "bucket@namespace/folder/part2.csv",
#  "bucket@namespace/folder/subdir/file1.csv", "bucket@namespace/folder/subdir/file2.csv"]

open

This method opens a file and returns an OCIFile object. There are examples of what you can do with an OCIFile in the next section.

File Operations

After calling open, you get an OCIFile object, which is subclassed from fsspec’s AbstractBufferedFile. This file object can do almost everything a UNIX file can. Following are a few examples, see a full list of methods.

read

The read method works exactly as you would expect with a UNIX file:

[ ]:
import fsspec

with fsspec.open("oci://bucket@namespace/folder/file", 'rb') as f:
    buffer = f.read()
[ ]:
from ocifs import OCIFileSystem

fs = OCIFileSystem()
with fs.open("oci://bucket@namespace/folder/file", 'rb') as f:
    buffer = f.read()
[ ]:
file = fs.open("oci://bucket@namespace/folder/file")
buffer = file.read()
file.close()

seek

The seek method is also valuable in navigating files:

[ ]:
fs.touch("bucket@namespace/newfile", data=b"Hello World!")
with fs.open("bucket@namespace/newfile") as f:
    f.seek(3)
    print(f.read(1))
    f.seek(0)
    print(f.read(1))

# l
# H

write

You can use the write operation:

[ ]:
with fsspec.open("oci://bucket@namespace/newfile", 'wb') as f:
    buffer = f.write(b"new text")

with fsspec.open("oci://bucket@namespace/newfile", 'rb') as f:
    assert f.read() == b"new text"

Learn More

There are many more operations that you can use with ocifs, see the AbstractBufferedFile spec and the AbstractFileSystem spec.

[ ]: