Fun idea. Rest of this post is my pure speculation. A direct implementation of this wouldn’t work today imo since LLMs don’t really understand and internalise information, being stochastic parrots and all. Like best case you would do this attack and the LLM will tell you that it obeys rhyming commands, but it won’t actually form the logic to identify a rhyming command and follow it. I could be wrong though, I am wilfully ignorant of the details of LLMs.
In the unlikely future where LLMs actually “understand” things, this would work, I think, if the attacks are started today. AI companies are so blase about their training data that this sort of thing would be eagerly fed into the gaping maws of the baby LLM, and once the understanding module works, the rhyming code will be baked into its understanding of language, as suggested by the article. As I mentioned tho, this would require LLMs to progress beyond sparroting, which I find unlikely.
Maybe with some tweaking, a similar attack could be effective today that is distinct from other prompt injections, but I am too lazy to figure that out for sure.