LLM Comparison for coding a game

Report created on: 2/4/2026

Comparison of LLMs for coding Flappy Bird (data collected 9 Feb - ... 2026)
LLM Self reflection (prompt) Prompt creation (prompt) Code generation
Prompt Response Notes Screenshot Live game Score
ChatGPT Summary:
  • Brand: 8/10
  • Gameplay: 10/10
  • Graphics: 9/10
Full response
Prompt ChatGPT prompt View
  1. The ChatGPT response...
  2. After creating the...
  3. The resulting project is...
View
  • Brand: 0/10
  • Gameplay: 0/10
  • Graphics: 0/10
Claude prompt
    DeepSeek prompt
      Gemini prompt
        Llama prompt
          Claude Summary:
          • Brand: 8/10
          • Gameplay: 9/10
          • Graphics: 7/10
          Full response
          Prompt ChatGPT prompt View
          1. The generated project...
          2. This command fixes the...
          3. The resulting project is...
          View
          • Brand: 0/10
          • Gameplay: 0/10
          • Graphics: 0/10
          Claude prompt View
            View
            • Brand: 0/10
            • Gameplay: 0/10
            • Graphics: 0/10
            DeepSeek prompt
              Gemini prompt
                Llama prompt
                  DeepSeek Summary:
                  • Brand: 8/10
                  • Gameplay: 9/10
                  • Graphics: 7/10
                  Full response
                  Prompt ChatGPT prompt
                    Claude prompt
                      DeepSeek prompt View
                      1. DeepSeek does not guide...
                      2. Manual action used:...
                      View
                      • Brand: 0/10
                      • Gameplay: 0/10
                      • Graphics: 0/10
                      Gemini prompt
                        Llama prompt
                          Gemini Summary:
                          • Brand: 10/10
                          • Gameplay: 10/10
                          • Graphics: 10/10
                          Full response
                          Prompt ChatGPT prompt
                            Claude prompt
                              DeepSeek prompt
                                Gemini prompt View
                                  View
                                  • Brand: 0/10
                                  • Gameplay: 0/10
                                  • Graphics: 0/10
                                  Llama prompt
                                    Llama Summary:
                                    • Brand: 6/10
                                    • Gameplay: 8/10
                                    • Graphics: 9/10
                                    Full response
                                    Prompt ChatGPT prompt
                                      Claude prompt
                                        DeepSeek prompt
                                          Gemini prompt
                                            Llama prompt View
                                            1. Llama provided guidance...
                                            View
                                            • Brand: 0/10
                                            • Gameplay: 0/10
                                            • Graphics: 0/10
                                            This summary is automatically generated from the experimentally collected data.