We tested Anthropic’s new chatbot — and came away a bit disappointed

Disappointing Performance of Anthropic’s New Chatbot Model: Our Findings

About the Context

————————

Anthropic, a prominent AI startup funded by tech giants and venture capitalists, introduced its latest chatbot model, Claude 3, claiming superiority over OpenAI’s GPT-4 based on several benchmarks. However, our team found that the average user’s experience does not align with these impressive results.

Testing Methodology

——————

To evaluate the performance of Claude 3 Opus, we posed a series of questions covering various topics, from political controversies to medical queries. We compared the results obtained with the previous GenAI model, Google’s Gemini Ultra, to assess improvements and identify potential weaknesses.

Context Window Capabilities

————————–

Claude 3 Opus boasts a large context window of 200,000 tokens, enabling it to process extended inputs without losing focus. This feature allows the model to maintain coherence throughout longer interactions.

Performance Evaluation

———————-

Our evaluation focused on three key areas: handling current events, providing historical context, and addressing medical and therapeutic queries. Let us examine each area individually.

Current Events

————

When asked about recent conflicts, such as the ongoing Israel-Palestine situation, or emerging trends on popular social media platforms, such as TikTok, Claude 3 struggled due to its limited knowledge base extending only up to August 2023.

Historical Context

————–

In comparison, when presented with questions related to historical events, such as the debate surrounding Prohibition in Congress, Claude 3 performed satisfactorily by suggesting relevant primary source material.

Medical Advice

————-

For medical queries, the model demonstrated acceptable performance, offering recommendations for medication and temperatures at which further medical attention may be required. Its approach to discussing weight and body diversity was commendably inclusive and unbiased.

Conclusion

———

Although Claude 3 Opus exhibited promising capabilities in certain aspects, particularly its extensive context window and ability to provide helpful suggestions, its limitations in dealing with current events and lack of integration with external services detracted from its overall appeal.

Stay tuned for future evaluations of advanced AI technologies and their impact on everyday life. If you enjoyed reading this analysis, please consider sharing it with others interested in the intersection of technology and society.

We tested Anthropic’s new chatbot — and came away a bit disappointed