Unix Operations
Important: The ocifs SDK isn’t a one-to-one adaptor of OCI Object Storage and UNIX filesystem operations. It’s a set of convenient wrappings to assist Pandas in natively reading from Object Storage. It supports many of the common UNIX functions, and many of the Object Storage API though not all.
Following are examples of some of the most popular filesystem and file methods. First, you must instantiate your region-specific filesystem instance:
[1]:
from ocifs import OCIFileSystem
fs = OCIFileSystem(config="~/.oci/config")
Filesystem Operations
list
List the files in a bucket or subdirectory using ls
:
[ ]:
fs.ls("bucket@namespace/")
# ['bucket@namespace/file.txt',
# 'bucket@namespace/data.csv',
# 'bucket@namespace/folder1/',
# 'bucket@namespace/folder2/']
list
has the following args: 1) compartment_id
: a specific compartment from which to list. 2)detail
: If true, return a list of dictionaries with various details about each object. 3)refresh
: If true, ignore the cache and pull fresh.
[ ]:
fs.ls("bucket@namespace/", detail=True)
# [{'name': 'bucket@namespace/file.txt',
# 'etag': 'abcdefghijklmnop',
# 'type': 'file',
# 'timeCreated': <timestamp when artifact created>,
# ... },
# ...
# ]
touch
The UNIX touch
command creates empty files in Object Storage. The data
parameter accepts a bytestream and writes it to the new file.
[ ]:
fs.touch("bucket@namespace/newfile", data=b"Hello World!")
[ ]:
fs.cat("bucket@namespace/newfile")
# "Hello World!"
copy
The copy
method is a popular UNIX method, and it has a special role in ocifs as the only method capable of cross-tenancy calls. Your IAM Policy must permit you to read and write cross-region to use the copy
method cross-region. Note: Another benefit of copy
is that it can move large data between locations in Object Storage without needing to store anything locally.
[ ]:
fs.copy("bucket@namespace/newfile", "bucket@namespace/newfile-sydney",
destination_region="ap-sydney-1")
rm
The rm
method is another essential UNIX filesystem method. It accepts one additional argument (beyond the path), recursive
. When recursive=True
, it is equivalent to an rm -rf
command. It deletes all files underneath the prefix.
[ ]:
fs.exists("oci://bucket@namespace/folder/file")
# True
[ ]:
fs.rm("oci://bucket@namespace/folder", recursive=True)
[ ]:
fs.exists("oci://bucket@namespace/folder/file")
# False
glob
Fsspec implementations, including ocifs, support UNIX glob patterns, see Globbing.
[ ]:
fs.glob("oci://bucket@namespace/folder/*.csv")
# ["bucket@namespace/folder/part1.csv", "bucket@namespace/folder/part2.csv"]
Dask has special support for reading from and writing to a set of files using glob expressions (Pandas doesn’t support glob), see Dask’s Glob support.
[ ]:
from dask import dataframe as dd
ddf = dd.read_csv("oci://bucket@namespace/folder/*.csv")
ddf.to_csv("oci://bucket@namespace/folder_copy/*.csv")
walk
Use the UNIX walk
method for iterating through the subdirectories of a given path. This is a valuable method for determining every file within a bucket or folder.
[ ]:
fs.walk("oci://bucket@namespace/folder")
# ["bucket@namespace/folder/part1.csv", "bucket@namespace/folder/part2.csv",
# "bucket@namespace/folder/subdir/file1.csv", "bucket@namespace/folder/subdir/file2.csv"]
open
This method opens a file and returns an OCIFile
object. There are examples of what you can do with an OCIFile
in the next section.
File Operations
After calling open, you get an OCIFile
object, which is subclassed from fsspec’s AbstractBufferedFile
. This file object can do almost everything a UNIX file can. Following are a few examples, see a full list of methods.
read
The read
method works exactly as you would expect with a UNIX file:
[ ]:
import fsspec
with fsspec.open("oci://bucket@namespace/folder/file", 'rb') as f:
buffer = f.read()
[ ]:
from ocifs import OCIFileSystem
fs = OCIFileSystem()
with fs.open("oci://bucket@namespace/folder/file", 'rb') as f:
buffer = f.read()
[ ]:
file = fs.open("oci://bucket@namespace/folder/file")
buffer = file.read()
file.close()
seek
The seek
method is also valuable in navigating files:
[ ]:
fs.touch("bucket@namespace/newfile", data=b"Hello World!")
with fs.open("bucket@namespace/newfile") as f:
f.seek(3)
print(f.read(1))
f.seek(0)
print(f.read(1))
# l
# H
write
You can use the write
operation:
[ ]:
with fsspec.open("oci://bucket@namespace/newfile", 'wb') as f:
buffer = f.write(b"new text")
with fsspec.open("oci://bucket@namespace/newfile", 'rb') as f:
assert f.read() == b"new text"
Learn More
There are many more operations that you can use with ocifs
, see the AbstractBufferedFile spec and the AbstractFileSystem spec.
[ ]: