Accenture Federal Services · GenAI Applications Engineer · Agents & RAG · Req 6718
What you say when they say "tell me about yourself."
"So my background is a mix of engineering and consulting. I started in product at a law firm, building compliance systems that tracked regulations across multiple jurisdictions. That's where I learned how to build things that have to be right — where a mistake means legal exposure, not just a bug."
"For the last two years I've been deep in GenAI. I built and deployed a multi-agent RAG platform — it's live right now. Python backend, LangGraph workflow, ChromaDB for vector search. A supervisor agent routes queries to retrieval, search, and synthesis agents. It streams responses back to the user. I can walk you through it if you want."
"What I'm looking for is the kind of work where the reliability bar is high. Where you can't just throw something over the wall. That's what attracted me to this role."
About 165 words. Say it out loud 5 times tonight until it sounds like talking.
toolchain.vercel.app — live and defensible
This is the system you walk them through when they ask "show me something you built." It has everything the JD asks for.
The user asks a question. A supervisor agent classifies the intent and routes to the right specialist. If it's a tool question, the RAG agent queries ChromaDB for the top 5 matches. If it needs fresh info, the search agent calls Tavily. Then the explain agent synthesizes everything into a structured markdown response that streams back to the user.
• Max 5 iterations then force-finish (prevents infinite loops)
• Pydantic catches malformed LLM output before it propagates
• Embedding fallback with explicit logging
• Structured logs via structlog on every agent
• Prometheus metrics tracked per agent
"Each AI tool is a natural semantic unit. Name, provider, category, pricing, pros and cons. So I use document-structure chunking, not fixed tokens. For unstructured docs I'd use recursive splitting, 512 tokens, 50 overlap."
When they ask "tell me about X," find the matching card.
One-liner: "A multi-agent RAG platform I built and shipped. LangGraph supervisor routes queries to a RAG agent over a vector DB, a search agent that calls external APIs, and an explain agent that synthesizes the answer."
What it does: "It helps developers discover and compare AI tools. You ask 'best vector databases for RAG,' the supervisor routes to retrieval, the RAG agent pulls the top 5 from my indexed database, and the explain agent writes back a structured comparison."
Why it matters: "The patterns transfer. Supervisor routing, guardrails, structured output, streaming. That's the same shape you'd build for a mission system — just different data and stricter constraints."
One-liner: "An SMS marketing platform with TCPA compliance baked in. Quiet-hours logic, automated opt-out, consent audit trails."
Why TCPA matters: "TCPA is the closest consumer-side analog to federal compliance. You encode legal rules into automated guardrails. Don't message outside allowed hours. Process opt-outs instantly. Log every consent action."
If they ask about regulated work: "TCPA compliance is the same pattern — encode the legal rules into automated guardrails, log everything, deny by default. I've shipped that pattern in production."
Do not name the client.
One-liner: "I built a compliance platform for a regulated industry client tracking regulations across multiple jurisdictions in real time."
What it tracked: "Per-jurisdiction licensing requirements, expiration dates, audit trails. Voice agent for natural-language queries. Elasticsearch for full-text search across the regulation corpus."
Why it matters: "Same problem federal missions face. Data classified by jurisdiction. Audit-ready logging. Multi-source regulation tracking."
If they ask the client name or industry: "Under NDA." If they push: "The architecture pattern is what matters."
One-liner: "An AI learning platform that uses RAG to ground generated content in source material. The interesting part is the context engineering system. 177 structured skills that guide agent behavior."
What context engineering means: "Treating prompts as a system, not a string. Each skill is a markdown file with trigger conditions, step-by-step instructions, and pitfalls. The agent loads the right skills contextually."
The federal bridge: "Federal needs prompt versioning and policy-as-code. Context engineering is the precursor. Structured, versioned, testable prompt libraries."
Every line from the job description, mapped to a sentence you can say.
"I use LangGraph with a StateGraph and Pydantic-typed router decisions. Supervisor pattern routing to specialist agents."
"ChromaDB in ToolChainDev. Vectorize in a second system. For federal I'd pick pgvector or OpenSearch. They run on existing PostgreSQL and reduce attack surface."
"I've documented these platforms and their patterns. Haven't shipped production on them yet. That's where I'd partner with the team to get up to speed fast."
"FastAPI backend, LangGraph workflow, Pydantic for structured output. That's my primary stack."
"Per-agent tool scoping. Search agent only sees web search. RAG agent only sees the vector store. No blanket access."
"Prometheus on every agent decision. Structured logs. Sentry for errors. Champion-challenger for prompt A/B testing. Auto-rollback on guardrail breach."
"Faithfulness above 95%. P95 latency under 3 seconds. Guardrail violation rate under 0.1%. Cost per query under 5 cents."
"Open-source model on-prem via vLLM. Local embeddings. pgvector in existing PostgreSQL. Trade-off is model quality versus data sovereignty."
"Every component authenticates every other. Audit trail on every query. I've shipped audit-first compliance systems in production."
"Shipped Docker. Reading-level familiarity with Kubernetes and Terraform. I pick up infrastructure stacks fast."
"Vertex is Google's equivalent of Bedrock. Gemini, Claude via Model Garden, open models like Llama. GCP is FedRAMP High authorized. If the agency is on Google, Vertex is the natural pick."
"I use n8n for visual workflow orchestration when I need to wire APIs, webhooks, and data pipelines fast. Different tool, same patterns. LangGraph for complex agents, n8n for quick integrations."
"For air-gapped: Llama or Mistral via vLLM, local embeddings via sentence-transformers or BGE, pgvector for vector search in existing PostgreSQL. No data egress. Trade-off is model quality versus data sovereignty."
"PII detection at ingestion. Redaction before embedding. Provenance tracking. Human-in-the-loop on low-confidence outputs."
Confirmed by Medium, LinkedIn, Dataford, DataCamp, and InterviewBit.
If you have 1 hour: 30 min on RAG, 15 on multi-agent, 10 on hallucinations, 5 on chunking.
Tap to reveal. Say your answer first, then check.
RAG combines information retrieval with generative models. It retrieves relevant documents from a knowledge base using vector search, then uses a generative model to synthesize an answer grounded in that retrieved context.
It grounds outputs in actual data. More factual, domain-specific responses without retraining. For federal use cases where data is sensitive or changes frequently, RAG is the right tool.
Standard LLM generation relies on pre-trained knowledge, frozen at training time. RAG retrieves real-time or proprietary information from a database that the model uses to generate.
This reduces hallucinations, provides domain-specific answers, and adapts to dynamic content without retraining. For federal missions, RAG lets you keep sensitive data in your own environment while using a general-purpose model.
Multi-hop retrieval sequentially retrieves context across multiple documents or steps. Instead of one search then answer, it's: retrieve document A, extract a clue, search for document B, synthesize.
Useful for complex queries requiring synthesis across sources. "Compare compliance requirements in jurisdictions X and Y" requires retrieving X's rules, then Y's, then comparing.
Vector databases store high-dimensional embeddings and enable efficient similarity search via Approximate Nearest Neighbor. They allow fast retrieval of semantically similar documents.
Without them, you'd compute similarity against every document on every query. Doesn't scale. In my system I use ChromaDB. For federal scale I'd use OpenSearch or pgvector.
1. Grounding — Answer only from retrieved context. System prompt: "Answer based ONLY on provided context."
2. Low temperature — 0.1 to 0.3 for factual retrieval.
3. Confidence thresholds — Below threshold, return "no results."
4. Citation enforcement — Agent references which sources it's drawing from.
5. Post-generation validation — Verify claims against source chunks.
6. Human-in-the-loop — High-stakes outputs route to review.
Supervisor pattern. Email intake agent classifies intent and extracts entities. Database agent queries internal systems via structured tool calls. Response drafting agent generates a grounded response. Guardrail agent validates output. Human review for sensitive cases.
Supervisor manages flow: circuit breakers, retries, fallback. Each agent has its own tool scope.
LangGraph extends LangChain with graph-based orchestration. Instead of linear chains, you define a StateGraph with nodes (agents) and conditional edges (routing).
In my system: supervisor, search, rag, explain nodes. Supervisor routes conditionally. All specialists return to supervisor. This cyclic flow is what linear chains can't do.
Key advantage: typed state object carries context between nodes.
Multiple layers. Input: prompt injection defense, PII detection. Tool scope: each agent only sees its own tools. Output: structured output enforcement, content filtering.
Loop prevention: max 5 iterations, max tokens, max tool calls. Cost guardrails: token budget. Audit logging: every decision logged. Circuit breakers on failures.
Fixed-size — split at N tokens. Quick prototyping.
Recursive — paragraph then sentence then word. General purpose.
Semantic — split where meaning shifts. Long-form.
Document-structure — headings and sections. Regulations, legal.
Late chunking — embed full doc first, then chunk. Preserves context.
For my tool data: document-structure. For unstructured: recursive 512-token with 50 overlap.
Retrieval: precision, recall, NDCG@k (most relevant ranked highest).
Generation: faithfulness (grounded in context?), answer relevance (addresses question?).
Golden test set of 50-100 Q&A pairs. LLM-as-a-judge for faithfulness. The RAG triad is my north star.
Quality: faithfulness above 95%, relevance above 90%.
Latency: p95 under 3 seconds.
Safety: violation rate under 0.1%.
Reliability: uptime 99.9%.
Cost: under 5 cents per query.
For prototyping: Chroma or FAISS. For production: pgvector (runs on PostgreSQL, reduces attack surface, helps ATO) or OpenSearch (FedRAMP-authorized, hybrid search).
Avoid Pinecone for air-gapped. Key federal constraint: can it run on-prem?
Memorize this order. If they ask "walk me through your pipeline," say these in order.
1. Ingest — Load documents. Clean and normalize text.
2. Chunk — Split into pieces. 512 tokens, 50 overlap. Preserve section boundaries.
3. Embed — Generate vectors. OpenAI 1536d or local sentence-transformers.
4. Store — Vector store. ChromaDB, pgvector, or OpenSearch.
5. Retrieve — Query embedding, top-K similarity, metadata filtering, reranking.
6. Generate — Retrieved context plus grounded prompt. Low temperature.
7. Validate — Citation check, hallucination check, guardrail check. Deliver with citations.
7 words: Ingest, Chunk, Embed, Store, Retrieve, Generate, Validate.
Federal mindset is not startup mindset.
DON'T
"I used GPT-4 for everything"
SAY
"I evaluate models across quality, safety, latency, cost. For federal I'd add FedRAMP as a gate."
DON'T
"I move fast and break things"
SAY
"I ship in weeks with guardrails, monitoring, and rollback capability."
DON'T
"I'm expert in ATO/STIGs"
SAY
"Working familiarity. I understand the constraints and would partner with security teams."
DON'T
"It worked well"
SAY
"I evaluated with NDCG@k and faithfulness scoring, hit X% on the golden set."
DON'T
Deep-dive on a project you can't defend
SAY
"Most production work is under NDA. I can walk through architecture and patterns."
Pick 4. Never say "I don't have questions."
"What does your evaluation pipeline look like? How do you measure quality and safety in production?"
"Which cloud platforms are you primarily building on? Bedrock, Azure OpenAI, Vertex?"
"How do you handle model inference in air-gapped or restricted environments?"
"What does success look like in the first 90 days?"
"How much is individual engineering versus collaborative design with the client?"
"What's the team structure?"
"How do you balance 'ship in weeks' with ATO and security reviews?"
"Standalone GenAI apps, or AI integrated into existing federal systems?"
When they ask "any final thoughts?"
"I guess if I had to boil it down — I build things that have to work. My compliance background taught me that. My RAG platform proves I can do it with GenAI. And I've been living inside the tools I use every day long enough to know where the sharp edges are."
"I know I don't have deep federal experience yet. But I understand the constraints — air-gapped inference, audit trails, data classification. And I learn infrastructure fast."
"I'd love to hear more about what the team is actually building."
• 60-second intro (5x aloud)
• ToolChainDev architecture (draw from memory)
• SLI/SLO table
• 4 hallucination techniques
• NDCG = "rewards relevant results ranked higher"
• Bedrock vs Azure vs Vertex (one line each)
• "Regulated multi-jurisdiction compliance platform"
• 4 questions to ask them
• Bookmark toolchain.vercel.app
• Walk through all 13 sections once
• Water, phone silenced, notepad
• Quiet room, neutral background
• 15 min early
You've built this. You can defend it.