We need to give our agent the ability to do stuff. We'll start by allowing it to list the contents of a directory and see each file's metadata (name and size).
Before we integrate this function with our LLM agent, let's just build the function itself. Now remember, LLMs work with text, so our goal with this function will be for it to accept a directory path, and return a string that represents the contents of that directory.
def get_files_info(working_directory, directory="."):
For reference, here's my project structure so far:
project_root/
├── calculator/
│ ├── main.py
│ ├── pkg/
│ │ ├── calculator.py
│ │ └── render.py
│ └── tests.py
└── functions/
└── get_files_info.py
The key idea is that the directory parameter will be treated as a relative path within the working_directory. We'll allow the LLM agent to specify which directory it wants to scan, but the working_directory will be set by us. This means we can limit the scope of directories and files that the LLM is able to view.
target_dir = os.path.normpath(os.path.join(working_dir_abs, directory))
# Will be True or False
valid_target_dir = os.path.commonpath([working_dir_abs, target_dir]) == working_dir_abs
f'Error: Cannot list "{directory}" as it is outside the permitted working directory'
Now our LLM agent has some guardrails: we never want it to be able to perform any work outside the working_directory that we give it.
Without this restriction, the LLM might run amok anywhere on the machine, reading sensitive files or overwriting important data. This is a very important step that we'll bake into every function the LLM can call.
f'Error: "{directory}" is not a directory'
All of our "tool call" functions, including get_files_info, should always return a string. If errors can be raised inside them, we need to catch those errors and return a string describing the error instead. This will allow the LLM to handle errors gracefully.
- README.md: file_size=1032 bytes, is_dir=False
- src: file_size=128 bytes, is_dir=True
- package.json: file_size=1234 bytes, is_dir=False
I've listed some useful standard library functions in the Tips section below.
The exact file sizes and even the order of files may vary depending on your operating system and file system. Your output doesn't need to match the example byte-for-byte, just the overall format.
Result for current directory:
- main.py: file_size=719 bytes, is_dir=False
- tests.py: file_size=1331 bytes, is_dir=False
- pkg: file_size=44 bytes, is_dir=True
Result for 'pkg' directory:
- calculator.py: file_size=1721 bytes, is_dir=False
- render.py: file_size=376 bytes, is_dir=False
Result for '/bin' directory:
Error: Cannot list "/bin" as it is outside the permitted working directory
Result for '../' directory:
Error: Cannot list "../" as it is outside the permitted working directory
To import from a subdirectory, use this syntax: from DIRNAME.FILENAME import FUNCTION_NAME
Where DIRNAME is the name of the subdirectory, FILENAME is the name of the file without the .py extension, and FUNCTION_NAME is the name of the function you want to import.
Run and submit the CLI tests.
Here are some standard library functions you'll find helpful:
os.path.abspath(): Get an absolute path from a relative pathos.path.join(): Join two paths together safely (handles slashes)os.path.normpath(): Normalize a path (handles things like ..)os.path.commonpath(): Get the common sub-path shared by multiple pathsos.listdir(): List the contents of a directoryos.path.isdir(): Check if a path is a directoryos.path.isfile(): Check if a path is a fileos.path.getsize(): Get the size of a file.join(): Join a list of strings together with a given separator