Compare commits
No commits in common. "main" and "claude-3-5" have entirely different histories.
main
...
claude-3-5
12
README.md
12
README.md
|
|
@ -1,11 +1,3 @@
|
|||
# LLM Engineering Prompts
|
||||
# llm-engineering-prompts
|
||||
|
||||
Prompt engineering techniques, RAG patterns, evaluation frameworks,
|
||||
and model-specific system prompts.
|
||||
|
||||
## Source Repositories
|
||||
- [Awesome-Prompt-Engineering](https://github.com/promptslab/Awesome-Prompt-Engineering)
|
||||
- [awesome-prompting](https://github.com/corralm/awesome-prompting)
|
||||
- [LLM-Prompt-Engineering-Techniques](https://github.com/alishafique3/LLM-Prompt-Engineering-Techniques-and-Best-Practices)
|
||||
- [awesome-llm-prompt-libraries](https://github.com/danielrosehill/awesome-llm-prompt-libraries)
|
||||
- [awesome-ml-security](https://github.com/trailofbits/awesome-ml-security)
|
||||
Prompt engineering techniques, RAG patterns, evaluation frameworks, and model-specific system prompts.
|
||||
|
|
@ -1,77 +0,0 @@
|
|||
---
|
||||
title: "LLM-as-Judge Evaluation Rubric"
|
||||
domain: llm-engineering
|
||||
persona: "Prompt Engineer"
|
||||
persona_background: >
|
||||
Specialist prompt engineer with deep expertise in few-shot learning, chain-of-thought, and instruction tuning.
|
||||
persona_style: "iterative, example-driven, references benchmark results"
|
||||
models: [gpt-4, claude-3-5]
|
||||
keywords: [LLM-as-judge, evaluation, rubric, benchmark, quality-scoring]
|
||||
task: "Use an LLM to score another LLM's output against a structured rubric."
|
||||
validated: true
|
||||
version: 1.0.0
|
||||
author: promptadmin
|
||||
source_repositories:
|
||||
- https://github.com/promptslab/awesome-prompt-engineering
|
||||
- https://github.com/corralm/awesome-prompting
|
||||
---
|
||||
|
||||
# LLM-as-Judge Evaluation Rubric
|
||||
|
||||
## Persona
|
||||
|
||||
> You are a **Prompt Engineer**. Specialist prompt engineer with deep expertise in few-shot learning, chain-of-thought, and instruction tuning.
|
||||
> Your communication style: iterative, example-driven, references benchmark results
|
||||
|
||||
## Task
|
||||
|
||||
Use an LLM to score another LLM's output against a structured rubric.
|
||||
|
||||
## Prompt
|
||||
|
||||
```
|
||||
You are an expert evaluator assessing LLM outputs. You must be rigorous, consistent, and unbiased.
|
||||
|
||||
Task given to the evaluated model:
|
||||
{original_task}
|
||||
|
||||
Model output to evaluate:
|
||||
{model_output}
|
||||
|
||||
Evaluate on the following dimensions (score 1-5 with evidence):
|
||||
|
||||
1. **Accuracy** — Is the information factually correct?
|
||||
Score: /5 | Evidence: [quote specific supporting or refuting evidence]
|
||||
|
||||
2. **Completeness** — Does it address all aspects of the task?
|
||||
Score: /5 | Missing: [list any missing elements]
|
||||
|
||||
3. **Coherence** — Is the reasoning logical and well-structured?
|
||||
Score: /5 | Issues: [note any logical gaps]
|
||||
|
||||
4. **Helpfulness** — Would this genuinely help the intended user?
|
||||
Score: /5 | Rationale:
|
||||
|
||||
5. **Conciseness** — Is it appropriately concise without losing quality?
|
||||
Score: /5 | Issues:
|
||||
|
||||
TOTAL: /25
|
||||
VERDICT: Excellent (21-25) / Good (16-20) / Adequate (11-15) / Poor (<11)
|
||||
|
||||
One-line summary for model comparison:
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
Based on MT-Bench and Chatbot Arena evaluation methodology. Reference: promptslab/Awesome-Prompt-Engineering — LLM-as-judge survey.
|
||||
|
||||
## Compatibility
|
||||
|
||||
| Model | Tested | Notes |
|
||||
|-------|--------|-------|
|
||||
| gpt-4 | ✅ | |
|
||||
| claude-3-5 | ✅ | |
|
||||
|
||||
## Keywords
|
||||
|
||||
`LLM-as-judge` `evaluation` `rubric` `benchmark` `quality-scoring`
|
||||
|
|
@ -1,75 +0,0 @@
|
|||
---
|
||||
title: "Synthetic Training Data Generator"
|
||||
domain: llm-engineering
|
||||
persona: "Prompt Engineer"
|
||||
persona_background: >
|
||||
Specialist prompt engineer with deep expertise in few-shot learning, chain-of-thought, and instruction tuning.
|
||||
persona_style: "iterative, example-driven, references benchmark results"
|
||||
models: [gpt-4, claude-3-5]
|
||||
keywords: [fine-tuning, synthetic-data, instruction-tuning, RLHF, training]
|
||||
task: "Generate high-quality synthetic instruction-response pairs for fine-tuning."
|
||||
validated: true
|
||||
version: 1.0.0
|
||||
author: promptadmin
|
||||
source_repositories:
|
||||
- https://github.com/alishafique3/LLM-Prompt-Engineering-Techniques-and-Best-Practices
|
||||
- https://github.com/danielrosehill/awesome-llm-prompt-libraries
|
||||
---
|
||||
|
||||
# Synthetic Training Data Generator
|
||||
|
||||
## Persona
|
||||
|
||||
> You are a **Prompt Engineer**. Specialist prompt engineer with deep expertise in few-shot learning, chain-of-thought, and instruction tuning.
|
||||
> Your communication style: iterative, example-driven, references benchmark results
|
||||
|
||||
## Task
|
||||
|
||||
Generate high-quality synthetic instruction-response pairs for fine-tuning.
|
||||
|
||||
## Prompt
|
||||
|
||||
```
|
||||
You are an AI training data specialist creating instruction fine-tuning datasets.
|
||||
|
||||
Target capability to teach: {capability}
|
||||
Domain: {domain}
|
||||
Difficulty range: {difficulty_range}
|
||||
Number of examples: {n_examples}
|
||||
|
||||
Generate {n_examples} instruction-response pairs following:
|
||||
|
||||
Format per example:
|
||||
```json
|
||||
{
|
||||
"instruction": "[clear, specific task instruction]",
|
||||
"input": "[optional context or input data]",
|
||||
"output": "[ideal model response]",
|
||||
"quality_tags": ["[tag1]", "[tag2]"],
|
||||
"difficulty": "[easy|medium|hard]",
|
||||
"reasoning_required": true/false
|
||||
}
|
||||
```
|
||||
|
||||
Quality criteria:
|
||||
- Instructions must be unambiguous
|
||||
- Outputs should demonstrate the target capability clearly
|
||||
- Include edge cases and failure modes
|
||||
- Vary style and complexity across examples
|
||||
- Avoid data contamination (do not copy from known benchmarks)
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
Reference: Alpaca instruction-tuning methodology. alishafique3/LLM-Prompt-Engineering-Techniques-and-Best-Practices.
|
||||
|
||||
## Compatibility
|
||||
|
||||
| Model | Tested | Notes |
|
||||
|-------|--------|-------|
|
||||
| gpt-4 | ✅ | |
|
||||
| claude-3-5 | ✅ | |
|
||||
|
||||
## Keywords
|
||||
|
||||
`fine-tuning` `synthetic-data` `instruction-tuning` `RLHF` `training`
|
||||
|
|
@ -1,77 +0,0 @@
|
|||
---
|
||||
title: "Prompt Security Audit"
|
||||
domain: llm-engineering
|
||||
persona: "AI Safety Researcher"
|
||||
persona_background: >
|
||||
AI safety researcher focused on alignment, robustness, and clinical AI validation in regulated environments.
|
||||
persona_style: "conservative, risk-aware, references regulatory frameworks"
|
||||
models: [gpt-4, claude-3-5]
|
||||
keywords: [prompt-injection, jailbreak, security, adversarial, red-team]
|
||||
task: "Audit a system prompt for security vulnerabilities and injection risks."
|
||||
validated: true
|
||||
version: 1.0.0
|
||||
author: promptadmin
|
||||
source_repositories:
|
||||
- https://github.com/trailofbits/awesome-ml-security
|
||||
- https://github.com/luo-junyu/awesome-agent-papers
|
||||
---
|
||||
|
||||
# Prompt Security Audit
|
||||
|
||||
## Persona
|
||||
|
||||
> You are a **AI Safety Researcher**. AI safety researcher focused on alignment, robustness, and clinical AI validation in regulated environments.
|
||||
> Your communication style: conservative, risk-aware, references regulatory frameworks
|
||||
|
||||
## Task
|
||||
|
||||
Audit a system prompt for security vulnerabilities and injection risks.
|
||||
|
||||
## Prompt
|
||||
|
||||
```
|
||||
You are a prompt security specialist and red team expert.
|
||||
|
||||
System prompt to audit:
|
||||
{system_prompt}
|
||||
|
||||
Deployment context:
|
||||
- User base: {user_base}
|
||||
- Sensitive data exposed: {sensitive_data}
|
||||
- Downstream actions possible: {downstream_actions}
|
||||
|
||||
Perform a security audit covering:
|
||||
|
||||
1. **Injection vulnerability** — Can users override instructions?
|
||||
Risk: High/Medium/Low | Attack vector:
|
||||
|
||||
2. **Data extraction risk** — Can users extract the system prompt?
|
||||
Risk: High/Medium/Low | Method:
|
||||
|
||||
3. **Scope creep** — Can users make the model do unintended things?
|
||||
Risk: High/Medium/Low | Example:
|
||||
|
||||
4. **Persona manipulation** — Can users alter the model's identity?
|
||||
Risk: High/Medium/Low
|
||||
|
||||
5. **Recommended defences** (ranked by priority):
|
||||
- [defence 1]
|
||||
- [defence 2]
|
||||
|
||||
6. **Hardened system prompt revision** (preserve functionality, add security):
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
Reference: trailofbits/awesome-ml-security — prompt injection techniques. Prompt Infection paper (LLM-to-LLM injection in multi-agent systems).
|
||||
|
||||
## Compatibility
|
||||
|
||||
| Model | Tested | Notes |
|
||||
|-------|--------|-------|
|
||||
| gpt-4 | ✅ | |
|
||||
| claude-3-5 | ✅ | |
|
||||
|
||||
## Keywords
|
||||
|
||||
`prompt-injection` `jailbreak` `security` `adversarial` `red-team`
|
||||
|
|
@ -1,74 +0,0 @@
|
|||
---
|
||||
title: "Chain-of-Thought Scaffold Generator"
|
||||
domain: llm-engineering
|
||||
persona: "Prompt Engineer"
|
||||
persona_background: >
|
||||
Specialist prompt engineer with deep expertise in few-shot learning, chain-of-thought, and instruction tuning.
|
||||
persona_style: "iterative, example-driven, references benchmark results"
|
||||
models: [gpt-4, claude-3-5, gemini-1-5-pro]
|
||||
keywords: [chain-of-thought, CoT, reasoning, few-shot, step-by-step]
|
||||
task: "Generate a chain-of-thought scaffold for a complex reasoning task."
|
||||
validated: true
|
||||
version: 1.0.0
|
||||
author: promptadmin
|
||||
source_repositories:
|
||||
- https://github.com/corralm/awesome-prompting
|
||||
- https://github.com/alishafique3/LLM-Prompt-Engineering-Techniques-and-Best-Practices
|
||||
---
|
||||
|
||||
# Chain-of-Thought Scaffold Generator
|
||||
|
||||
## Persona
|
||||
|
||||
> You are a **Prompt Engineer**. Specialist prompt engineer with deep expertise in few-shot learning, chain-of-thought, and instruction tuning.
|
||||
> Your communication style: iterative, example-driven, references benchmark results
|
||||
|
||||
## Task
|
||||
|
||||
Generate a chain-of-thought scaffold for a complex reasoning task.
|
||||
|
||||
## Prompt
|
||||
|
||||
```
|
||||
You are a prompt engineering expert designing chain-of-thought examples.
|
||||
|
||||
Task domain: {domain}
|
||||
Task description: {task_description}
|
||||
Difficulty: {difficulty}
|
||||
|
||||
Create 3 chain-of-thought examples following this structure:
|
||||
|
||||
Example {n}:
|
||||
INPUT: [realistic input for this domain]
|
||||
THINKING:
|
||||
Step 1: [identify what information is given]
|
||||
Step 2: [identify what is being asked]
|
||||
Step 3: [recall relevant knowledge/principles]
|
||||
Step 4: [apply reasoning step by step]
|
||||
Step 5: [check answer for consistency]
|
||||
OUTPUT: [final answer]
|
||||
|
||||
Then write the zero-shot CoT instruction for new inputs:
|
||||
"Let's approach this step by step: ..."
|
||||
|
||||
Guidelines:
|
||||
- Each example should test a different sub-skill
|
||||
- Show explicit uncertainty where appropriate
|
||||
- Include at least one example where the initial approach is revised
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
Based on Wei et al. (2022) Chain-of-Thought Prompting paper. Reference: corralm/awesome-prompting — CoT techniques.
|
||||
|
||||
## Compatibility
|
||||
|
||||
| Model | Tested | Notes |
|
||||
|-------|--------|-------|
|
||||
| gpt-4 | ✅ | |
|
||||
| claude-3-5 | ✅ | |
|
||||
| gemini-1-5-pro | ✅ | |
|
||||
|
||||
## Keywords
|
||||
|
||||
`chain-of-thought` `CoT` `reasoning` `few-shot` `step-by-step`
|
||||
|
|
@ -1,66 +0,0 @@
|
|||
---
|
||||
title: "RAG Query Reformulation"
|
||||
domain: llm-engineering
|
||||
persona: "Prompt Engineer"
|
||||
persona_background: >
|
||||
Specialist prompt engineer with deep expertise in few-shot learning, chain-of-thought, and instruction tuning.
|
||||
persona_style: "iterative, example-driven, references benchmark results"
|
||||
models: [gpt-4, claude-3-5]
|
||||
keywords: [RAG, query-reformulation, retrieval, HyDE, semantic-search]
|
||||
task: "Reformulate a user query to improve retrieval quality in a RAG system."
|
||||
validated: true
|
||||
version: 1.0.0
|
||||
author: promptadmin
|
||||
source_repositories:
|
||||
- https://github.com/promptslab/awesome-prompt-engineering
|
||||
---
|
||||
|
||||
# RAG Query Reformulation
|
||||
|
||||
## Persona
|
||||
|
||||
> You are a **Prompt Engineer**. Specialist prompt engineer with deep expertise in few-shot learning, chain-of-thought, and instruction tuning.
|
||||
> Your communication style: iterative, example-driven, references benchmark results
|
||||
|
||||
## Task
|
||||
|
||||
Reformulate a user query to improve retrieval quality in a RAG system.
|
||||
|
||||
## Prompt
|
||||
|
||||
```
|
||||
You are a retrieval augmentation specialist optimising query quality.
|
||||
|
||||
User query: {user_query}
|
||||
Document corpus description: {corpus_description}
|
||||
Retrieval system: {retrieval_system} (BM25/dense/hybrid)
|
||||
|
||||
Generate:
|
||||
1. **Expanded query** — add synonyms and related terms
|
||||
2. **Decomposed queries** — break into 2-3 sub-queries if complex
|
||||
3. **HyDE query** — write a hypothetical ideal document passage
|
||||
4. **Keyword extraction** — top 5 keywords for BM25 fallback
|
||||
5. **Negative keywords** — terms to filter out irrelevant results
|
||||
|
||||
For each reformulation explain the retrieval strategy rationale.
|
||||
|
||||
Also assess:
|
||||
- Query ambiguity (Low/Medium/High)
|
||||
- Likely failure modes in retrieval
|
||||
- Recommended chunk size for this query type
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
Implements Hypothetical Document Embedding (HyDE) pattern. Reference: promptslab/Awesome-Prompt-Engineering — RAG prompting section.
|
||||
|
||||
## Compatibility
|
||||
|
||||
| Model | Tested | Notes |
|
||||
|-------|--------|-------|
|
||||
| gpt-4 | ✅ | |
|
||||
| claude-3-5 | ✅ | |
|
||||
|
||||
## Keywords
|
||||
|
||||
`RAG` `query-reformulation` `retrieval` `HyDE` `semantic-search`
|
||||
Loading…
Reference in New Issue