{ "cells": [ { "cell_type": "markdown", "id": "aefd23b8", "metadata": {}, "source": [ "## Unix Operations" ] }, { "cell_type": "markdown", "id": "858282b9", "metadata": {}, "source": [ "_Important: The ocifs SDK isn't a one-to-one adaptor of OCI Object Storage and UNIX filesystem operations. It's a set of convenient wrappings to assist Pandas in natively reading from Object Storage. It supports many of the common UNIX functions, and many of the Object Storage API though not all._\n", "\n", "Following are examples of some of the most popular filesystem and file methods. First, you must instantiate your region-specific filesystem instance:" ] }, { "cell_type": "code", "execution_count": 1, "id": "9d450d82", "metadata": {}, "outputs": [], "source": [ "from ocifs import OCIFileSystem\n", "\n", "fs = OCIFileSystem(config=\"~/.oci/config\")" ] }, { "cell_type": "markdown", "id": "4c7a2cd9", "metadata": {}, "source": [ "### Filesystem Operations\n", "\n", "#### list\n", "List the files in a bucket or subdirectory using `ls`:" ] }, { "cell_type": "code", "execution_count": null, "id": "2dee4999", "metadata": {}, "outputs": [], "source": [ "fs.ls(\"bucket@namespace/\")\n", "# ['bucket@namespace/file.txt', \n", "# 'bucket@namespace/data.csv', \n", "# 'bucket@namespace/folder1/', \n", "# 'bucket@namespace/folder2/']" ] }, { "cell_type": "markdown", "id": "02c3a1bc", "metadata": {}, "source": [ "`list` has the following args: 1) `compartment_id`: a specific compartment from which to list. 2)`detail`: If true, return a list of dictionaries with various details about each object. 3)`refresh`: If true, ignore the cache and pull fresh." ] }, { "cell_type": "code", "execution_count": null, "id": "8abc4ef0", "metadata": {}, "outputs": [], "source": [ "fs.ls(\"bucket@namespace/\", detail=True)\n", "# [{'name': 'bucket@namespace/file.txt', \n", "# 'etag': 'abcdefghijklmnop',\n", "# 'type': 'file',\n", "# 'timeCreated': ,\n", "# ... },\n", "# ...\n", "# ]" ] }, { "cell_type": "markdown", "id": "fa2650b9", "metadata": {}, "source": [ "#### touch\n", "The UNIX `touch` command creates empty files in Object Storage. The `data` parameter accepts a bytestream and writes it to the new file." ] }, { "cell_type": "code", "execution_count": null, "id": "408d380c", "metadata": {}, "outputs": [], "source": [ "fs.touch(\"bucket@namespace/newfile\", data=b\"Hello World!\")" ] }, { "cell_type": "code", "execution_count": null, "id": "592180ce", "metadata": {}, "outputs": [], "source": [ "fs.cat(\"bucket@namespace/newfile\")\n", "# \"Hello World!\"" ] }, { "cell_type": "markdown", "id": "128b4656", "metadata": {}, "source": [ "#### copy\n", "The `copy` method is a popular UNIX method, and it has a special role in ocifs as the only method capable of cross-tenancy calls. Your IAM Policy must permit you to read and write cross-region to use the `copy` method cross-region. Note: Another benefit of `copy` is that it can move large data between locations in Object Storage without needing to store anything locally." ] }, { "cell_type": "code", "execution_count": null, "id": "f6c81ce8", "metadata": {}, "outputs": [], "source": [ "fs.copy(\"bucket@namespace/newfile\", \"bucket@namespace/newfile-sydney\",\n", " destination_region=\"ap-sydney-1\")" ] }, { "cell_type": "markdown", "id": "50c6a28b", "metadata": {}, "source": [ "#### rm\n", "The `rm` method is another essential UNIX filesystem method. It accepts one additional argument (beyond the path), `recursive`. When `recursive=True`, it is equivalent to an `rm -rf` command. It deletes all files underneath the prefix." ] }, { "cell_type": "code", "execution_count": null, "id": "aa74be1d", "metadata": {}, "outputs": [], "source": [ "fs.exists(\"oci://bucket@namespace/folder/file\")\n", "# True" ] }, { "cell_type": "code", "execution_count": null, "id": "fe753c02", "metadata": {}, "outputs": [], "source": [ "fs.rm(\"oci://bucket@namespace/folder\", recursive=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "79ea99e6", "metadata": {}, "outputs": [], "source": [ "fs.exists(\"oci://bucket@namespace/folder/file\")\n", "# False" ] }, { "cell_type": "markdown", "id": "4074ec28", "metadata": {}, "source": [ "#### glob\n", "Fsspec implementations, including ocifs, support UNIX glob patterns, see [Globbing](https://man7.org/linux/man-pages/man7/glob.7.html)." ] }, { "cell_type": "code", "execution_count": null, "id": "0f9271b7", "metadata": {}, "outputs": [], "source": [ "fs.glob(\"oci://bucket@namespace/folder/*.csv\")\n", "# [\"bucket@namespace/folder/part1.csv\", \"bucket@namespace/folder/part2.csv\"]" ] }, { "cell_type": "markdown", "id": "cde56557", "metadata": {}, "source": [ "Dask has special support for reading from and writing to a set of files using glob expressions (Pandas doesn't support glob), see [Dask's Glob support](https://docs.dask.org/en/latest/remote-data-services.html)." ] }, { "cell_type": "code", "execution_count": null, "id": "18e0ccab", "metadata": {}, "outputs": [], "source": [ "from dask import dataframe as dd\n", "\n", "ddf = dd.read_csv(\"oci://bucket@namespace/folder/*.csv\")\n", "ddf.to_csv(\"oci://bucket@namespace/folder_copy/*.csv\")" ] }, { "cell_type": "markdown", "id": "9ab01f8d", "metadata": {}, "source": [ "#### walk\n", "\n", "Use the UNIX `walk` method for iterating through the subdirectories of a given path. This is a valuable method for determining every file within a bucket or folder." ] }, { "cell_type": "code", "execution_count": null, "id": "6b352841", "metadata": {}, "outputs": [], "source": [ "fs.walk(\"oci://bucket@namespace/folder\")\n", "# [\"bucket@namespace/folder/part1.csv\", \"bucket@namespace/folder/part2.csv\",\n", "# \"bucket@namespace/folder/subdir/file1.csv\", \"bucket@namespace/folder/subdir/file2.csv\"]" ] }, { "cell_type": "markdown", "id": "f4f44be1", "metadata": {}, "source": [ "#### open\n", "This method opens a file and returns an `OCIFile` object. There are examples of what you can do with an `OCIFile` in the next section." ] }, { "cell_type": "markdown", "id": "8c3bd2ca", "metadata": {}, "source": [ "### File Operations\n", "After calling open, you get an `OCIFile` object, which is subclassed from fsspec's `AbstractBufferedFile`. This file object can do almost everything a UNIX file can. Following are a few examples, see [a full list of methods](https://filesystem-spec.readthedocs.io/en/latest/api.html?highlight=AbstractFileSystem#fsspec.spec.AbstractBufferedFile)." ] }, { "cell_type": "markdown", "id": "043178d2", "metadata": {}, "source": [ "#### read\n", "The `read` method works exactly as you would expect with a UNIX file:" ] }, { "cell_type": "code", "execution_count": null, "id": "47c8c713", "metadata": {}, "outputs": [], "source": [ "import fsspec\n", "\n", "with fsspec.open(\"oci://bucket@namespace/folder/file\", 'rb') as f:\n", " buffer = f.read()" ] }, { "cell_type": "code", "execution_count": null, "id": "9921ecfd", "metadata": {}, "outputs": [], "source": [ "from ocifs import OCIFileSystem\n", "\n", "fs = OCIFileSystem()\n", "with fs.open(\"oci://bucket@namespace/folder/file\", 'rb') as f:\n", " buffer = f.read()" ] }, { "cell_type": "code", "execution_count": null, "id": "f6947d24", "metadata": {}, "outputs": [], "source": [ "file = fs.open(\"oci://bucket@namespace/folder/file\")\n", "buffer = file.read()\n", "file.close()" ] }, { "cell_type": "markdown", "id": "83da3320", "metadata": {}, "source": [ "#### seek\n", "The `seek` method is also valuable in navigating files:" ] }, { "cell_type": "code", "execution_count": null, "id": "d4aac2be", "metadata": {}, "outputs": [], "source": [ "fs.touch(\"bucket@namespace/newfile\", data=b\"Hello World!\")\n", "with fs.open(\"bucket@namespace/newfile\") as f:\n", " f.seek(3)\n", " print(f.read(1))\n", " f.seek(0)\n", " print(f.read(1))\n", "\n", "# l\n", "# H" ] }, { "cell_type": "markdown", "id": "b5c279a9", "metadata": {}, "source": [ "#### write\n", "You can use the `write` operation:" ] }, { "cell_type": "code", "execution_count": null, "id": "306767fa", "metadata": {}, "outputs": [], "source": [ "with fsspec.open(\"oci://bucket@namespace/newfile\", 'wb') as f:\n", " buffer = f.write(b\"new text\")\n", " \n", "with fsspec.open(\"oci://bucket@namespace/newfile\", 'rb') as f:\n", " assert f.read() == b\"new text\"" ] }, { "cell_type": "markdown", "id": "9a51c4bb", "metadata": {}, "source": [ "### Learn More\n", "There are many more operations that you can use with `ocifs`, see the [AbstractBufferedFile spec](https://filesystem-spec.readthedocs.io/en/latest/api.html?highlight=AbstractFileSystem#fsspec.spec.AbstractBufferedFile) and the [AbstractFileSystem spec](https://filesystem-spec.readthedocs.io/en/latest/api.html?highlight=AbstractFileSystem#fsspec.spec.AbstractFileSystem)." ] }, { "cell_type": "code", "execution_count": null, "id": "d14f907a", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "ds", "language": "python", "name": "ds" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" } }, "nbformat": 4, "nbformat_minor": 5 }