Photo

Hi, I'm Aaron.

Beware XY Problems with LLMs

LLMs

This originally began as one lengthy post but I’ve broken it into multiple posts that are more digestible. This is part 3.


An XY problem is

where the question is about an end user’s attempted solution (X) rather than the root problem itself (Y or Why?).

This is a common consequence of situations where the attempted solution is borne from an incomplete understanding of the problem. These are usually spotted by being very specific inquiries that are devoid of context, and the correct way to address them is to first ask for the greater context, to understand why they chose to solve it in this way.

The wikipedia article gives this example:

Asking about how to grab the last three characters in a filename (X) instead of how to get the file extension (Y), which may not consist of three characters

This becomes a problem with LLMs because LLMs generally do not push back to look for XY problems. They will give you their best guess at what you are asking for, whether it’s the correct approach or not.

Anecdotal example

On another occasion, a different coworker used an LLM to generate a script that performed a series of commands over SSH to a remote server. We reviewed the script together, focusing on what we felt were the riskiest parts (the SSH commands, predominantly) and found it to be acceptable.

However we ran into a bug that wasn’t immediately apparent. Most server IDs had 5 digits (the vast majority did). Some of them had 4 digits, though. In the generated script, one of the early steps was to parse that string to split off the last 3 digits to be used separately from the first digits.

This happened because, in the initial prompt, the sample data provided had 5 digits so the generated code only accounted for parsing metadata from those 5 digits.

This approach broke for server IDs with 4 digits because it was peeling off the first 2 (always) rather than the last 3 (always) – right solution for the wrong problem. e.g.:

12345 => 12, 345 (Correct)
1234 => 12, 34 (Incorrect, should be: 1, 234)

We overlooked it in the initial pass because we gave it too much leeway and focused on the wrong aspect of the solution. We found an existing script that had already solved this:

part1 = serverID / 1000
part2 = serverID % 1000

This corrected the issue.

This is perhaps a soft XY issue, but highlights the point about LLMs giving you exactly what you ask for, monkey’s paw style.