Get Files

We need to give our agent the ability to do stuff. We'll start by allowing it to list the contents of a directory and see each file's metadata (name and size).

Before we integrate this function with our LLM agent, let's just build the function itself. Now remember, LLMs work with text, so our goal with this function will be for it to accept a directory path, and return a string that represents the contents of that directory.

Assignment

Make a new directory called functions in the root of your project (not within the calculator directory). Inside, create a new file called get_files_info.py. Start writing the following function there:

def get_files_info(working_directory, directory="."):

For reference, here's my project structure so far:

project_root/
├── calculator/
│   ├── main.py
│   ├── pkg/
│   │   ├── calculator.py
│   │   └── render.py
│   └── tests.py
└── functions/
    └── get_files_info.py

The key idea is that the directory parameter will be treated as a relative path within the working_directory. We'll allow the LLM agent to specify which directory it wants to scan, but the working_directory will be set by us. This means we can limit the scope of directories and files that the LLM is able to view.

Begin implementing the get_files_info function. First we need to validate that the path to the directory is inside the working_directory.
1. Use os.path.abspath() to get the absolute path of the working_directory. For example, if you pass in "calculator" as the working_directory, this might return something like "/home/steve/ai-agent-project/calculator".
2. Construct the full path to the target directory by calling os.path.join() with the absolute working_directory and the directory argument. To protect against shenanigans, also make sure to call os.path.normpath() on the combined path. This will handle things like "..", turning the path into its true form. The calls should look something like this:
```
target_dir = os.path.normpath(os.path.join(working_dir_abs, directory))
```
3. Now check if target_dir falls within the absolute working_directory path. The safest way of doing this is to use os.path.commonpath(), which finds the longest sub-path shared by two paths. For example, if the working directory is "/home/steve/ai-agent-project/calculator" and the target directory is "/home/steve/ai-agent-project/calculator/pkg", then the common path will be "/home/steve/ai-agent-project/calculator". That is, the common path should be the same as the absolute working directory path – if the target directory is valid. You could code this expectation like so:
```
# Will be True or False
valid_target_dir = os.path.commonpath([working_dir_abs, target_dir]) == working_dir_abs
```
4. If the target directory does not fall within the working directory, return an error string in the following format:
```
f'Error: Cannot list "{directory}" as it is outside the permitted working directory'
```

Now our LLM agent has some guardrails: we never want it to be able to perform any work outside the working_directory that we give it.

Without this restriction, the LLM might run amok anywhere on the machine, reading sensitive files or overwriting important data. This is a very important step that we'll bake into every function the LLM can call.

If the directory argument is not a directory, again, return an error string:
```
f'Error: "{directory}" is not a directory'
```

All of our "tool call" functions, including get_files_info, should always return a string. If errors can be raised inside them, we need to catch those errors and return a string describing the error instead. This will allow the LLM to handle errors gracefully.

Iterate over the items in the target directory. For each of them, record the name, file size, and whether it's a directory itself. Use this data to build and return a string representing the contents of the target directory. It should be in the following format:
```
- README.md: file_size=1032 bytes, is_dir=False
- src: file_size=128 bytes, is_dir=True
- package.json: file_size=1234 bytes, is_dir=False
```
I've listed some useful standard library functions in the Tips section below.

The exact file sizes and even the order of files may vary depending on your operating system and file system. Your output doesn't need to match the example byte-for-byte, just the overall format.

If any errors are raised by the standard library functions that you call, catch them and instead return a string describing the error. You may find it convenient to put everything in this function in a try/except block. When returning an error string, always prefix it with Error:.

We need a way to manually debug our new get_files_info function! Create a new test_get_files_info.py file in the root of your project. When executed directly (uv run test_get_files_info.py) it should run the following function calls and print the results matching the formatting below (not necessarily the exact numbers).

get_files_info("calculator", "."):

Result for current directory:
  - main.py: file_size=719 bytes, is_dir=False
  - tests.py: file_size=1331 bytes, is_dir=False
  - pkg: file_size=44 bytes, is_dir=True

get_files_info("calculator", "pkg"):

Result for 'pkg' directory:
  - calculator.py: file_size=1721 bytes, is_dir=False
  - render.py: file_size=376 bytes, is_dir=False

get_files_info("calculator", "/bin"):

Result for '/bin' directory:
    Error: Cannot list "/bin" as it is outside the permitted working directory

get_files_info("calculator", "../"):

Result for '../' directory:
    Error: Cannot list "../" as it is outside the permitted working directory

To import from a subdirectory, use this syntax: from DIRNAME.FILENAME import FUNCTION_NAME

Where DIRNAME is the name of the subdirectory, FILENAME is the name of the file without the .py extension, and FUNCTION_NAME is the name of the function you want to import.

Run uv run test_get_files_info.py, and ensure your function works as expected.

Run and submit the CLI tests.

Tips

Here are some standard library functions you'll find helpful:

os.path.abspath(): Get an absolute path from a relative path
os.path.join(): Join two paths together safely (handles slashes)
os.path.normpath(): Normalize a path (handles things like ..)
os.path.commonpath(): Get the common sub-path shared by multiple paths
os.listdir(): List the contents of a directory
os.path.isdir(): Check if a path is a directory
os.path.isfile(): Check if a path is a file
os.path.getsize(): Get the size of a file
.join(): Join a list of strings together with a given separator