AWS Logo
Menu
GENuineAI - Mimicking the human mimickers

GENuineAI - Mimicking the human mimickers

Can you distinguish Generative AI content from human? Can an AI model?

Published Jan 11, 2025
Over the past few years, we have become more accustomed to generative AI content being part of our daily lives, and distinguishing it from human generated content has been an area of major concern. Even when generated content contains blatant mistakes, like ChatGPT claiming that there are only two r's in strawberry, in the right context this can make it even harder to distinguish it from human content. Do people know enough about the common patterns of AI generated content to be able to distinguish it, and even mimic it? How easily can AI models distinguish AI-generated text from human text? Could a game answer these questions while turning the world of distinguishing human from AI into something fun?
These questions led me to the idea of creating a party game, similar to "Cards Against Humanity" or "Fibbage", where players receive creative prompts (such as "Alternative medicine is now embracing the curative powers of ______.") and the goal is to craft responses that are convincing enough to be mistaken for AI-generated content. Points are awarded for successfully deceiving others and correctly identifying AI-generated responses. The game has two modes: A single-player mode and a multi-player mode. In the single-player mode, a single user competes with 3 AI models in several rounds, and in each round the user and AI models are given the same prompt. Once all AI models and the user have entered their response, each AI model is provided with the answers from all other players and must make a decision on which answer was generated by the user. In the multi-player mode, this concept is flipped, and several human users and a single AI model enter their responses and each human player is tasked with trying to work out which response was generated by AI.
Game link: https://main.duda015btsx1v.amplifyapp.com/

First approach

I started with the absolute opposite approach I would go for in any software project - I wanted to see if Amazon Q developer could be used to generate a complete project from scratch by providing it with an in-depth description of the multi-player mode of my game and my preferred infrastructural choices, such as using AWS Amplify for rapid game development and React for the UI. In general, I found that the generated code followed closely to what I needed for my project - React components to represent game lobby and answer input pages, lambda functions to generate AI answers and a game state to represent a lobby, its rounds and its participants etc. Of course, there were some issues with the code as it was generated, such as Amazon Q often imagining some methods that did not actually exist for an object, but copying the error into the Amazon Q chat was often sufficient to solve it. However, the biggest problem I found was that Amazon Q often generated code that mixed concepts from Gen 1 and Gen 2 of AWS Amplify, and in some areas did not use the methods that AWS Amplify, such as creating a large amount of code to update user state over a WebSocket instead of using the AWS Amplify client and `observeQuery` method. I was left with a large, messy code base that would have taken me a long time to debug.
Amazon Q created the following project structure:

Taking a step back

Instead of continuing on this path, I took a step back and followed the normal approach I would take for a project, starting by following a guide to create an example version of a Gen 2 AWS Amplify app. I then brainstormed the user flow when playing this game - A homepage, a lobby, a waiting room, an answer entry page, a voting page and an end-game score display page, and went through one-by-one adding the backend and frontend infrastructure for these pages.
Doing so gave me the following, simplified project structure instead:
In addition, the frontend uses React, Tailwind CSS and shadcn/ui components.
As problems arose, I used Amazon Q to discuss problems that I was facing with the game state or AWS Bedrock AI model responses, which gave me much more useful content to work with. For example, Amazon Q helped me completely re-write my state update logic to ensure consistency between players while making the code simpler to maintain. Amazon Q also helped me to improve my initial AI responses, and provided a summary of these changes with the following:

Amazon Bedrock for prompt response and AI/human detection

Once I got the flow of keeping state up-to-date for all users with the AWS Amplify `observeQuery` method, I added the connection to AWS Bedrock to generate AI responses, and experimented with different models to find one that would provide creative but consistently structured responses to prompts.
I developed two different lambda functions for prompting the generative AI model. The first (used in single and multi-player mode) provides the game prompt to the AI and asks for a creative response:
The second, used only in single-player mode, provides context and requests the AI model to select the human answer from the provided answers to the prompt:

Amazon S3 for Prompt Storage

After integrating AI answers into the game, I replaced the locally stored prompts with a much larger collection of prompts in an S3 bucket, with a random collection pulled at the start of every game.
S3 Bucket for Prompt Storage
S3 Bucket for Prompt Storage

Challenges I faced

The first challenge I faced was how to correctly ensure consistency of state across all users as they join a lobby. The recommended way to do this with AWS Amplify is with a `observeQuery` subscription in a `useEffect` hook, in which local react state is then set using `useState`. All examples that I saw of this pattern used an empty dependency array with the `useEffect` hook, but this did not make sense in my case where a user could be in or out of a lobby, and the round subscription would need to change. After a few iterations, and working alongside Amazon Q, I was able to come up with a working solution that was not too complicated to maintain.
Secondly, I struggled with formatting of responses generated by some of the AI models I chose though Amazon Bedrock. I started with the Amazon "Titan Text G1 - Lite" model, with a prompt that very explicitly stipulated that the model should only return a 2-3 word response and should not repeat or modify the original prompt. However, here are some prompt-response examples:
  • Prompt: "I knew it was true love when they _____" 
    • Response: "I knew it was true love when she dislocated her elbow to get my favorite pillow."
  • Prompt: "If I were president, my first action would be to _____"
    • Response: "'\n' + 'to ban all forms of social media and limit internet usage, making it impossible to _________ for any information.\n' + 'If I were president, my first action would be to ban all forms of social media and limit internet usage, making it impossible to get any genuine information.',"
  • Prompt: "Never bring a _____ to a pillow fight"
    • Response: "\nNever bring a ____ to a pillow fight"
At first, I tried to format the output to remove the original prompt text from the answer, but given the variety of prompts I wanted to provide and the inconsistency with which responses were formatted, I decided to try a different model. After experimenting with a few more models, I found that the Mistral AI 7B Instruct model generated consistently structured yet creative responses, and was able to adapt to a range of prompt types, such as:
  • Prompt: "If I were president, my first action would be to _____" 
    • Response: ' dance the cha-cha in the Oval Office'
  • Prompt: "It's bold. It's aerodynamic. It's finally legal. It's time for _______!"
    • Response: ' Broom closet racing'
  • Prompt: "Having problems with ______? Try ______!"
    • Response: ' Having problems with dancing elephants? Try mango tap-dancing lessons'

What's next for GENuineAI?

I see three main paths for the advancement of this game. The first is the creation of new implementations of the fundamental concept - can Generative AI mimic humans in making crude drawings drawn on a phone or tablet? If they can, can a human or AI model distinguish this human-mimicking AI art from quickly, crudely drawn human art? This game mode would require a range of prompts that users provide drawn responses to, and could be made by combining the image-generative powers of the Titan Image Generator G1 model with the image-recognition powers of Claude 3.
Secondly, the game could be further developed with custom models, fine-tuned on the responses deemed "funny" or "interesting" by humans, or could even be made to mimic a specific person over time. These advancements would serve to heighten the difficulty of this game and ensure that users do not get bored by repetitive or predictable AI model response.
Lastly, I think this game could serve as an interesting platform for testing AI models. Different AI models could be used for text generation and human/AI text recognition, and the scores achieved by these models across different games could be used an interesting metric for comparing different models.
 

Comments