Philosophical Research:The LLM Olympics

This is a collection of tests designed to solidly demonstrate the actual reasons LLMs shouldn't be used to answer questions, do formal logic, write papers, script videos, or even make alchemical combination games. Many tech demos focus on clumsily attempting to demonstrate that LLMs can do things, and many opinion pieces go on and on about the purely philosophical reasons they "definitely" couldn't capture the unique human spirit the author purports to exist, but it isn't as common to put tasks in front of LLMs that should be reasonable and keep guiding them onward and onward in good faith toward achieving those tasks until they absolutely break. (With the exception of "jailbreaking" research, of course.)

All these tests were run on ollama, an offline LLM runtime that operates in a terminal window, and on a standard consumer computer, within less than 2 gigabytes (GiB) of RAM. Further technical specifications will be given on the individual test pages.

Rules

Three very important rules will be followed in all of these tests:

1) Absolutely no online models will be used, only models that can be run entirely offline. This is mainly for the ethical concern of making sure that running the models does not use more computing power or rack space than a regular computer program. However, it also has the benefit of creating the simplest test cases with no external variables. If there is only 1 gigabyte of model or less and not 10 more gigabytes of model hiding out of view, it is easier to know the full range of behaviors of the model, and if nobody else is running the model, there will not be any external actions "the company" can take at the same time the test is running such as datamining conversations or inserting ads. All the causes and effects inside the test will be in one place.

2) No generated sentences will be directly copied onto any page. All the text on these pages is created manually. The longest quotations of generated text on these pages will be approximately three words long.

3) The LLM must not be given an unreasonable task, only tasks which fit within the boundaries of its known programming, bugs, and quirks. Each task will include several steps of "testing understanding" to make sure the LLM is getting the intended answers at every single step before then giving it harder questions requiring inference and not directly explained in the text. Unless the task proves to be truly impossible, the test will not stop until the LLM actually completes the task.

Tests