Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Prototype
Items
Properties
All Categories
Recent changes
Random page
Help about MediaWiki
Wiki editing manual
Philosophical Research
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Philosophical Research:The LLM Olympics
Project page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Special pages
Page information
In other projects
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
This is a collection of tests designed to solidly demonstrate the actual reasons LLMs shouldn't be used to answer questions, do formal logic, write papers, script videos, or even make alchemical combination games. Many tech demos focus on clumsily attempting to demonstrate that LLMs {{em|can}} do things, and many opinion pieces go on and on about the purely philosophical reasons they "definitely" couldn't capture the unique human spirit the author purports to exist, but it isn't as common to put tasks in front of LLMs that {{em|should be reasonable}} and keep guiding them onward and onward in good faith toward achieving those tasks until they absolutely break. (With the exception of "jailbreaking" research, of course.) All these tests were run on [https://docs.ollama.com/linux ollama], an offline LLM runtime that operates in a terminal window, and on a standard consumer computer, within less than 2 gigabytes (GiB) of RAM. Further technical specifications will be given on the individual test pages. == Rules == Three very important rules will be followed in all of these tests: <strong>1)</strong> Absolutely no online models will be used, only models that can be run entirely offline. This is mainly for the ethical concern of making sure that running the models does not use more computing power or rack space than a regular computer program. However, it also has the benefit of creating the simplest test cases with no external variables. If there is only 1 gigabyte of model or less and not 10 more gigabytes of model hiding out of view, it is easier to know the full range of behaviors of the model, and if nobody else is running the model, there will not be any external actions "the company" can take at the same time the test is running such as datamining conversations or inserting ads. All the causes and effects inside the test will be in one place. <strong>2)</strong> No generated sentences will be directly copied onto any page. All the text on these pages is created manually. The longest quotations of generated text on these pages will be approximately three words long. <strong>3)</strong> The LLM must not be given an unreasonable task, only tasks which fit within the boundaries of its known programming, bugs, and quirks. Each task will include several steps of "testing understanding" to make sure the LLM is getting the intended answers at every single step before then giving it harder questions requiring inference and not directly explained in the text. Unless the task proves to be truly impossible, the test will not stop until the LLM actually completes the task. == Tests == <!-- * Context window test * AI badly solves Deltarune * Explaining wave machines * Wavebuilder combinations test - make sure it is getting the same combinations, then start pushing it * Real proposition test - Is or isn't Deng Xiaoping Thought an anarchism? ---> [[Category:Thesis portals]] [[Category:LLM Olympics (RD)]]
Summary:
Please note that all contributions to Philosophical Research may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar
free resource
.
Copyright is complete nonsense
, but people do have to buy items to be able to charge anyone taxes.
Cancel
Editing help
(opens in new window)
Template used on this page:
Template:Em
(
edit
)