Humans 2.0

Project Overview

Humans 2.0 is a cutting-edge application developed with Angular and Node.js, designed to deliver immersive and interactive avatar experiences. Leveraging Three.js, the app seamlessly loads and renders .fbx/.glb avatars within a dynamic canvas environment. Users can engage in multimodal conversations with their avatars through a chat interface, where animations are synchronized with user inputs and AI-generated responses. This project integrates advanced AI technologies to create lifelike and responsive interactions, making it a standout solution in the realm of interactive digital experiences.

Key Technical Features

Avatar Rendering and Animation:
- Three.js Integration: Utilized Three.js for loading and rendering .fbx/.glb avatar models on a WebGL canvas, ensuring high-performance and realistic 3D graphics.
- Animation Management: Implemented a robust system for handling avatar animations using a library of local .fbx files. Animations are dynamically triggered based on user interactions and AI-generated cues.
- Lifelike Movements: Synchronized facial and body animations with user speech through viseme generation, enhancing the realism and engagement of avatar interactions.
AI and Chat Integration:
- ChatGPT Completion API: Integrated the ChatGPT Completion API (with support for other Large Language Models) to generate structured and contextually relevant responses. These responses include animation cues that select appropriate animations from the avatar's library.
- Speech Recognition: Enabled user speech input via microphone, converting speech to text using Google's Speech-to-Text API, allowing for hands-free interaction.
- Text Input Support: Provided an alternative method for users to input text directly into the chat interface, ensuring flexibility in how users interact with the avatar.
- Response Analysis: Analyzed AI-generated responses to extract facial and body animation cues, driving real-time avatar animations that correspond with the conversation flow.
Speech Synthesis and Synchronization:
- Microsoft Azure Text-to-Speech API: Utilized Microsoft's Azure Text-to-Speech API to synthesize speech from AI-generated text, producing natural and clear audio responses.
- Viseme Generation: Generated visemes (visual representations of phonemes) to synchronize avatar mouth movements with the synthesized speech audio, ensuring realistic and coherent lip-syncing.
Multimodal Chat Capabilities:
- Image Detection and Generation: Supported image detection and generation using DALL-E 3, allowing avatars to respond with relevant visuals based on the conversation context.
- API Integration via Zapier: Enabled access to a wide array of APIs through Zapier, expanding the application's functionality and integration potential for diverse user needs.
User Interaction Features:
- Comprehensive Camera Controls: Provided extensive camera controls, allowing users to adjust the viewing angle, zoom, and focus within the 3D environment for a personalized experience.
- Animation Triggering: Offered a full animation list for manual triggering of avatar movements, giving users greater control over interactions and enabling custom scenarios.
- Pre-Built Scenarios: Included pre-built scenarios such as game modes and a news anchor mode, enhancing the app's versatility and user engagement through specialized interaction modes.
- Multi-Avatar Support: Allowed loading of two avatars on stage, each driven by separate AI instances with distinct personalities, facilitating interactive and dynamic dialogues between avatars.
Modular and Extensible Architecture:
- Structured JSON Instructions: Designed the app's codebase to be modular, enabling easy addition of new scenarios and features through structured JSON instructions. This approach ensures scalability and maintainability.
- Environment and Lighting Control: Powered by Three.js, the app supports full scene interaction, including environment and lighting controls, allowing for customizable and immersive virtual settings.
Avatar Creation and Compatibility:
- Reallusion Character Creator: Created lifelike avatars using Reallusion Character Creator, ensuring high-quality and realistic character models that enhance user engagement.
- OpenGL Technology: Leveraged OpenGL for rendering compatibility across various browsers and mobile platforms, ensuring wide accessibility and consistent performance.

Technologies Used

Front-End:
- Angular, TypeScript, Three.js, WebGL, HTML5, CSS3, SASS/SCSS
Back-End:
- Node.js, Express.js
AI & Machine Learning:
- ChatGPT Completion API, OpenAI GPT Models, Microsoft Azure Text-to-Speech API, Google's Speech-to-Text API, DALL-E 3
Speech and Animation:
- Viseme Generation, Local .fbx Animation Library, Reallusion Character Creator
DevOps & Tools:
- Git, CI/CD Pipelines, Docker, Zapier
Additional Tools:
- WebSockets, RESTful APIs, Google Maps API (if applicable)
Performance & Optimization:
- Code Splitting, Tree-Shaking, Lazy Loading, Webpack Bundle Analyzer

Project Outcome

Humans 2.0 successfully delivers an interactive and engaging avatar experience, enabling users to communicate seamlessly with AI-driven avatars through both text and speech. The integration of real-time animation synchronization and advanced AI features creates lifelike and responsive interactions. The application's modular architecture allows for easy expansion and customization, supporting a variety of user scenarios and enhancing overall functionality. By leveraging modern web technologies and AI capabilities, Humans 2.0 stands out as a sophisticated tool for interactive digital experiences, offering potential for further advancements in full scene interaction and environment control.