Enhanced Calendar Assistant

Prompt Engineering Research - Spring 2024

Project Overview

This project expanded upon a previous prototype of an LLM-powered calendar assistant, focusing on enhancing its capabilities through systematic application of prompt engineering techniques. I created an improved interface using React and Node.js, then implemented and evaluated various prompting strategies including self-consistency, chain-of-thought, few-shot, template, and persona approaches.

The research included a comprehensive analysis comparing these techniques across multiple state-of-the-art language models: GPT-4 Turbo, Mistral Large, GPT-3.5, and Mistral 7B. Results showed that combining few-shot learning with templating yielded the most significant performance improvements, with self-consistency also demonstrating strong standalone results.

A key technical contribution was the development of a simplified JSON-based calendar representation format that proved more effective than the previous ICAL approach. This research provides valuable insights into how prompt engineering can significantly enhance the performance of LLMs in structured tasks like calendar management.

Technologies

React

Node.js

JavaScript

Large Language Models

API Integration

GPT-4 Turbo

Mistral Large

Prompt Engineering

Few-shot Learning

JSON

From complex ICAL to simple JSON structure

ICAL Format (Complex)

BEGIN:VCALENDAR
VERSION:2.0
BEGIN:VEVENT
UID:uid1@example.com
DTSTAMP:19970714T170000Z
DTSTART:19970714T170000Z
END:VEVENT
END:VCALENDAR

JSON Format (Simple)

{
  "id": "event-123",
  "title": "Meeting",
  "start": "2024-05-05T10:00:00"
}

Prompt Engineering Techniques

Click on a technique to see details:

Naive Baseline

Basic task description

Baseline

Few-Shot

Examples of queries

+0.4 pts

Self-Consistency

Multiple reasoning paths

+0.6 pts

Template

Structured format

+0.5 pts

Few-Shot + Template

Combined approach

+1.0 pts

Few-Shot + Template (Best)

Combined examples with structured format guidance.

Results & Conclusions

Performance across prompting techniques:

Baseline 6.0/10

Few-Shot 6.4/10

Self-Consistency 6.6/10

Template 6.5/10

Few-Shot + Template 7.0/10

Key Findings

Combined Few-Shot + Template approach was most effective
Self-Consistency was the best standalone technique
JSON format significantly improved model performance vs ICAL
GPT-4 Turbo performed best overall

Back to Portfolio