InferIQ
LLM Evaluation framework that leverages LLMs to Evaluate other LLMs
Generates answers to questions using a sample dataset across an evaluation pool of LLMs, then a group of Judge LLMs assesses and rates each response. Results are visualized in graphs, with additional metrics such as BERT Score and Inference Time.