Enhanced Calendar Assistant

Prompt Engineering Research - Spring 2024

Project Overview

This project expanded upon a previous prototype of an LLM-powered calendar assistant, focusing on enhancing its capabilities through systematic application of prompt engineering techniques. I created an improved interface using React and Node.js, then implemented and evaluated various prompting strategies including self-consistency, chain-of-thought, few-shot, template, and persona approaches.

The research included a comprehensive analysis comparing these techniques across multiple state-of-the-art language models: GPT-4 Turbo, Mistral Large, GPT-3.5, and Mistral 7B. Results showed that combining few-shot learning with templating yielded the most significant performance improvements, with self-consistency also demonstrating strong standalone results.

A key technical contribution was the development of a simplified JSON-based calendar representation format that proved more effective than the previous ICAL approach. This research provides valuable insights into how prompt engineering can significantly enhance the performance of LLMs in structured tasks like calendar management.

Technologies

React
Node.js
JavaScript
Large Language Models
API Integration
GPT-4 Turbo
Mistral Large
Prompt Engineering
Few-shot Learning
JSON

From complex ICAL to simple JSON structure

ICAL Format (Complex)

BEGIN:VCALENDAR VERSION:2.0 BEGIN:VEVENT UID:uid1@example.com DTSTAMP:19970714T170000Z DTSTART:19970714T170000Z END:VEVENT END:VCALENDAR

JSON Format (Simple)

{ "id": "event-123", "title": "Meeting", "start": "2024-05-05T10:00:00" }

Prompt Engineering Techniques

Click on a technique to see details:

Naive Baseline

Basic task description

Baseline

Few-Shot

Examples of queries

+0.4 pts

Self-Consistency

Multiple reasoning paths

+0.6 pts

Template

Structured format

+0.5 pts

Few-Shot + Template

Combined approach

+1.0 pts

Few-Shot + Template (Best)

Combined examples with structured format guidance.

Results & Conclusions

Performance across prompting techniques:

Baseline 6.0/10
Few-Shot 6.4/10
Self-Consistency 6.6/10
Template 6.5/10
Few-Shot + Template 7.0/10

Key Findings

  • Combined Few-Shot + Template approach was most effective
  • Self-Consistency was the best standalone technique
  • JSON format significantly improved model performance vs ICAL
  • GPT-4 Turbo performed best overall