LLM Comparison for coding a game

Report created on: 2/4/2026

Comparison of LLMs for coding Flappy Bird (data collected 9 Feb - ... 2026)
LLM	Self reflection (prompt)	Prompt creation (prompt)	Code generation
LLM	Self reflection (prompt)	Prompt creation (prompt)	Prompt	Response	Notes	Screenshot	Live game	Score
ChatGPT	Summary: Brand: 8/10 Gameplay: 10/10 Graphics: 9/10 Full response	Prompt	ChatGPT prompt	View	The ChatGPT response... After creating the... The resulting project is...		View	Brand: 0/10 Gameplay: 0/10 Graphics: 0/10
			Claude prompt	∅
			DeepSeek prompt	∅
			Gemini prompt	∅
			Llama prompt	∅
Claude	Summary: Brand: 8/10 Gameplay: 9/10 Graphics: 7/10 Full response	Prompt	ChatGPT prompt	View	The generated project... This command fixes the... The resulting project is...		View	Brand: 0/10 Gameplay: 0/10 Graphics: 0/10
			Claude prompt	View			View	Brand: 0/10 Gameplay: 0/10 Graphics: 0/10
			DeepSeek prompt	∅
			Gemini prompt	∅
			Llama prompt	∅
DeepSeek	Summary: Brand: 8/10 Gameplay: 9/10 Graphics: 7/10 Full response	Prompt	ChatGPT prompt	∅
			Claude prompt	∅
			DeepSeek prompt	View	DeepSeek does not guide... Manual action used:...		View	Brand: 0/10 Gameplay: 0/10 Graphics: 0/10
			Gemini prompt	∅
			Llama prompt	∅
Gemini	Summary: Brand: 10/10 Gameplay: 10/10 Graphics: 10/10 Full response	Prompt	ChatGPT prompt	∅
			Claude prompt	∅
			DeepSeek prompt	∅
			Gemini prompt	View			View	Brand: 0/10 Gameplay: 0/10 Graphics: 0/10
			Llama prompt	∅
Llama	Summary: Brand: 6/10 Gameplay: 8/10 Graphics: 9/10 Full response	Prompt	ChatGPT prompt	∅
			Claude prompt	∅
			DeepSeek prompt	∅
			Gemini prompt	∅
			Llama prompt	View	Llama provided guidance...		View	Brand: 0/10 Gameplay: 0/10 Graphics: 0/10

This summary is automatically generated from the experimentally collected data.