From Concept to Production: Building CalOohPay with AI-Assisted Development

The Beginning: A Problem

There was a time when, every month, I spend 10-15 minutes reconciling on-call rotas for payroll - this was just for my teams. As an engineering manager responsible for multiple teams, this task was boring but necessary to compensate my engineers who bore the inconvenience of on-call responsibilities. I’d manually review PagerDuty schedules, count weekday versus weekend hours, apply different compensation rates, and compile everything for payroll processing. The more teams one had, the more time it took away from the already tightly packed schedule of an engineering manager, who manages multiple teams.

In SRE terms, it was toil. It was exactly the kind of problem that should have been automated.

So I built CalOohPay - a command-line tool that automates the calculation of out-of-hours on-call compensation for engineering teams as part of a Firebreak or Hackathon at work. I didn’t have a team to join me during the hackathon, as the objective of the tool was to remove toil for engineering managers. Nor did I have the power of AI at the time. We all know how engineering managers are shielding teams from chaos and translating and responding to asks from senior leadership. They are extremely stretched from either side and have a very high burn-out rate unless they are self motivated. Although I got a basic working prototype for the firebreak to showcase success, I polished it to more reasonable solution in the coming week, spending an hour or two every night for about 4 nights. The CLI had a few users across the organisation including myself.

Then in 2025 October, I decided to explore what I could with AI. This sparked many ideas to explore and I got to work. From making CalOohPay more polished with some additional timezone support to publishing a browser compatible library on npm and then using that to build a web application that could open up the application to more users. That was really satisfying.

This part of the journey to a production-ready web application taught me something profound about modern software development: AI-assisted coding isn’t about replacing developers; it’s about amplifying our ability to deliver value.

The Original Vision: CLI First

The first iteration of CalOohPay was straightforward:

npx caloohpay -r "SCHEDULE_ID" -s "2024-01-01" -u "2024-01-31"

A Node.js CLI tool that:

Fetched schedule data from PagerDuty’s API
Calculated compensation based on weekday vs weekend rates
Supported multiple teams and schedules
Handled timezone conversions automatically
Generated auditable records for payroll

I built this project using traditional development methods - research, design, code, test, iterate. It worked. It solved my problem. But I knew it could reach more people if it had a web interface. In fact there were non-engineering teams like product specialists who go on-call too. They too have to do similar accounting every month.

This motivated me even more to act on the idea of making a web based version of CalOohPay.

Enter GitHub Copilot with Claude Sonnet 4.5

I’d used GitHub Copilot before for code completion, but the integration with Claude Sonnet 4.5 represented a paradigm shift. This wasn’t just autocomplete on steroids; it was like having a highly capable junior developer who could:

Understand complex requirements
Generate entire components - As someone who has spend majority of his career in backend development, generating front-end code was a breeze now!
Suggest architectural patterns - excellent companion to bounce ideas off of.
Debug issues collaboratively - this is very contextual and can be frustrating but still really good.
Write tests and documentation

But the real power came from learning how to work with it effectively.

The Critical Success Factor: Clear Constraints

Here’s what I learned quickly: garbage in, garbage out applies to AI just as much as to traditional software.

The difference between mediocre and exceptional AI-assisted development lies in the constraints you establish upfront. Before I started building the web application, I created:

1. Copilot Instructions

I documented my project’s conventions in .github/copilot-instructions.md, with some basic stuff like below:

## Project Context
- TypeScript-first development
- Next.js App Router (not Pages Router)
- Material UI v5 for components
- Strict type safety - no 'any' types
- Functional components with hooks (no class components)
- Server Components by default, Client Components only when needed

## Code Style
- Use named exports, not default exports
- Prefer const over let
- Use async/await over promises.then()
- Extract magic numbers to constants
- Write descriptive variable names

## Testing Requirements
- Unit tests for all business logic
- Component tests for UI components
- E2E tests for critical user flows

These instructions acted as a contract between me and the AI. However, as we progressed with additional development, Claude suggested further changes to the instructions, making it what it is today. Including the following:

how code should be organised in folders and each folder’s purpose
conventions for the import statements
how the routes looked like, how not to duplicate the caloohpay base library functionality
the authentication systems

Every suggestion it made adhered to the constraints as described, creating consistency across the codebase. It is like enforcing best practices and standards in a team owned repository but simply by using a markdown file.

Check out the latest copilot-instructions on the repository.

2. Specific Prompts

Instead of prompts that just said “make it better”, which might probably work with human beings, I learned to be very specific:

❌ Bad: “Add a calendar view”

✅ Good: “Create a CalendarView component using FullCalendar with Luxon for date handling. It should display on-call schedules in monthly view, color-code different users, and show time ranges on hover. Use Material UI theming for colors.”

The specificity led to better initial results and fewer iterations. So sometimes, I’d delegate such specific tasks to the cloud agent and would get notified in less than 20 minutes that the feature was implemented. Sometimes despite the specificity, it would write code that looked like it worked, but didn’t.

So when I got back to the pull request, I had to ensure I didn’t just eye-ball the changes, but loaded the branch locally and tested it. There were so many times when the project wouldn’t even run properly or multiple tests would fail, while the newest component’s tests would pass.

With the speed at which Copilot with Claude generated code, I am certain that it was still faster than me learning NextJs from scratch and attempting to experiment myself.

The Development Workflow: Human + AI

Phase 1: Architecture and Planning (Human-Led)

As someone with an engineering background, I didn’t like the idea of asking AI to do something it thought was what I wanted. Like the best practices constraint in the Copilot instructions as I shared earlier, I wanted to be in control of high-level decisions, tech stack, etc.:

What features to build
Technology stack choices
Security considerations
User experience flow

I think this is crucial. Humans must be able to do the strategic thinking.

Phase 2: Implementation (Collaborative)

Once I knew what to build, AI accelerated how I built it.

Component Generation: I’d describe a component’s requirements, and Copilot would generate a first draft:

// Me: "Create a ScheduleSearchBar component with debounced search,
//      loading state, and Material UI autocomplete"

// Copilot generated 80% of this:
export function ScheduleSearchBar({ onScheduleSelect }: Props) {
  const [searchTerm, setSearchTerm] = useState('');
  const [loading, setLoading] = useState(false);
  const debouncedSearch = useDebounce(searchTerm, 300);
  // ... rest of implementation
}

There were times when despite the copilot instructions it generated components that mixed both presentation and business logic. Thus I had to iterate through such changes to stick to what was originally instructed. I feel this is perhaps because I sometimes accidentally let VSCode’s Copilot choose the agent that was available - which actually gives you a 10% discount on the cost. But might get you the wrong agent for the task. My personal experience was Claude Sonnet 4.5 was incredibly helpful, and Gemini 3 Pro was also good. But I only used the latter where Claude struggled with debugging. Gemini Pro often took longer to respond than Claude. But these were tests from my laptop rather than the cloud agent.

Test Generation: For every component, I’d ask for tests:

// Me: "Write comprehensive tests for ScheduleSearchBar"

// Copilot generated test cases including:
// - Debounce behaviour
// - Loading states
// - User interactions
// - Error handling
// - Accessibility checks

API Integration: The AI understood PagerDuty’s API structure and generated type-safe clients:

// It created properly typed API responses
export interface PagerDutySchedule {
  id: string;
  name: string;
  time_zone: string;
  schedule_layers: ScheduleLayer[];
}

// And the corresponding service methods
export async function fetchSchedule(
  scheduleId: string
): Promise<PagerDutySchedule> {
  // Implementation with proper error handling
}

Phase 3: Debugging (True Collaboration)

This is where the AI partnership really was truly necessary. When things broke, I could:

Paste the error message: The AI would analyze and suggest fixes
Share problematic code: It would identify the issue, sometimes get lost in the problem too, and propose solutions
Discuss trade-offs: I’d explain the problem domain, and it would suggest architectural improvements

But here’s the critical insight: sometimes the AI needed help from me.

There were moments when:

The AI sometimes suggested solutions that would compile and run but would be logically wrong
Some edge cases which are specific to PagerDuty’s API behaviour weren’t handled

In these moments, I’d provide context:

Me: "That solution works for most cases, but PagerDuty's override 
     feature can create overlapping schedules. We need to detect 
     and handle overlaps to avoid double-counting hours."

Copilot: "Good point. Let's add an overlap detection algorithm..."

This back-and-forth felt remarkably like pair programming with a junior developer who learns quickly.

Cloud Agents: Parallel Development

One of the most powerful features was using cloud agents for long-running tasks. Instead of sitting and watching tests run or builds compile, I could:

Kick off the task: “Run the full test suite and fix any failing tests”
Continue working: Move on to the next feature while the agent worked
Review results: Come back to see completed work or specific issues flagged

This parallel workflow transformed my productivity. I wasn’t blocked waiting for builds or test runs. The AI agent worked independently, and I’d review its changes when ready.

Example workflow:

9:00 AM - Me: "Run E2E tests and fix any failures"
9:01 AM - Start working on the CSV export feature
9:45 AM - Agent: "Fixed 3 failing tests, 1 requires your input on expected behaviour"
9:50 AM - Review fixes, provide clarification on the one test
10:00 AM - Both tasks complete

Without cloud agents, this would have been sequential, taking nearly twice as long.

The Results: Measuring Impact

Let me be concrete about the impact:

Speed

Traditional estimate: 3-4 months for a production-ready web application Actual time with AI assistance: 6 weeks

That’s a 50-60% reduction in development time.

Code Quality

Test coverage: 87% (higher than my usual ~70%)
Type safety: 100% TypeScript, zero ‘any’ types
Accessibility: WCAG 2.1 AA compliant
Bundle size: Optimized with automatic code splitting

The AI didn’t just write code faster; it wrote very good code. This didn’t come naturally. But the constraints laid out initially helped it follow some guidelines. Those constraints ensured the following:

It always added tests
It even suggested accessibility attributes that I might not have even considered

Feature Completeness

The fact that I could generated an application with the following features with just a small time investment was a big win! If you consider the following features, you’ll realise why I am genuinely impressed.

Dual authentication (OAuth 2.0 + API Token)
Progressive search with instant local results
Multiple visualization modes (List + Calendar)
Comprehensive CSV export
Full keyboard navigation
Dark mode support

With AI assistance, I was more than happy to add to the scope, simply to test ideas and learn quicker. Considering how us humans get frustrated with multiple changes to scope at times, I find this - mindlessly following prompts method, a significant advantage for rapid product development.

In CalOohPay Web, I wouldn’t have implemented data visualisation at all, if it were up to just my development skills. I had never touched charting libraries since I left FactSet. Furthermore, I am a new parent and getting the time to explore and experiment after regular working hours when the toddler is home, is extremely challenging - unless I plan in advance with my wife.

The Deliverables: Real World Impact

Today, CalOohPay Web is:

Ready for production: Could be used by anyone in any organisation in its current form. If they use PagerDuty to manage on-call shifts, they can use CalOohPay’s web ui
Open source: MIT licensed for others to benefit - I am very tempted to change this later.
Well-documented: Comprehensive README and setup guides. With every change the document was updated - stark contrast from real world developers, who barely get the time to finish developing software. So asking for documentation is probably going to add another sprint.
Actively maintained: Regular updates and improvements - totally upto me really.

The command-line tool CalOohPay is:

Published on npm: npm install -g caloohpay
Used as a library: Integrated into other tools
Well-tested: Comprehensive test suite
Documented API: Full API documentation available

What started as a personal productivity tool has become a useful resource for the engineering community.

Lessons Learned: What Works

1. Start with Strong Constraints

The quality of AI output is directly proportional to the quality of your constraints. Invest time upfront in:

Clear architectural decisions
Project conventions - file organisation, programming language style, indentation etc.
Testing expectations - covering behaviours and being pragmatic vs testing every line of code.

2. Maintain Strategic Control

Let the AI agent handle tactical implementation, which is the help you need. But keep firm control over the following:

Feature prioritization - ask suggestions and research using AI tools as to what might be more valuable, but ultimately you choose.
Architecture decisions - already mentioned earlier and cannot stress the importance of human involvement in this enough
Security considerations - Coding agents can help identify vulnerabilities and non security compliant coding practices in the repository. Use this to your advantage and ensure you prioritise fixing the right stuff. And when it comes to system level security, always assess trade-offs between the various authentication methods yourself, before asking the coding agent to use one.
User experience design - here again, GenAI tools can be enthusiastic about implementing something a certain way. However, think it through, does it feel intuitive, are the buttons in the place where a human would expect, when transitioning from one screen to another, should everything be reloaded, etc..

3. Review Everything

AI-generated code should be reviewed as carefully as human-generated code:

Does it handle edge cases?
Are there security implications?
Is it maintainable?
Does it follow best practices?

If you aren’t being the brain of the operation, then coding/code generation is just mundane. The logical problems are to be solved in the head. But of course, there is nothing wrong in being a bit lazy and asking AI come up with something to see how it would implement. But it is crucial to test and review the application to know that it was what was intended.

There were many times when Claude would finish implementation so quickly that I’d be struggling to keep up with reviews. But I still ensured that I spend time reading and understanding the code.

Trust, but verify.

4. Iterate on Prompts

Like any skill, writing effective prompts improves with practice:

Start specific, get more specific
Provide context about the domain
Reference existing patterns in your codebase
Ask for alternatives when the first suggestion misses

I actually went all in and wrote an entire feature specification, taking the approach of describing what I wanted from an overview to the tiniest detail about the look of the page until I covered all the inter component interactions that I expected to happen in my head. Then iterated on this as I got the agent to deliver the code.

What I found very useful was to have a chat with the agent first, regarding the approach it thought would work, then after reviewing that plan, I’d either suggest changes or try the suggested approach. But when implementing an entirely new page, I found more success in writing a full specification covering - outcome, layout of the page, what information must be displayed on the page, while others be accessible on a button press etc. Then reiterating the best practices - test behaviours at the page level, separation of business logic from the user interface etc.

5. Use the Right Tool for the Job

Not every task benefits equally from AI assistance:

Great for AI:

Boilerplate code generation
Test writing
Type definitions
Standard component patterns
Documentation - keeping it up to date

Better done by humans:

Novel algorithm design
Complex business logic with many constraints
Security-critical code
Performance optimization
API design

Challenges and Limitations

As I have hinted earlier, not everything went smooth. I wasn’t always impressed. There were some debugging sessions when I felt like, I had to do more hand holding than necessary. So listing some of the challenges to look out for here:

1. Context Window Limitations

Large files or complex refactorings sometimes exceeded the AI’s context window. I’d need to break the work into smaller chunks.

Solution: Structure code in smaller, focused modules that fit within context limits.

2. Hallucinations

Occasionally, the AI would confidently suggest APIs or libraries that didn’t exist, or misremember API signatures. This actually happened for Husky - the library used to create git hooks.

Another instance happened when debugging. AI could try to fix a test when the it is no longer a relevant test. So give specific files or whatever to help it with the context so that it can choose the most appropriate way to fix a problem. This was especially true after a major feature change.

Solution: Always verify against official documentation. Treat AI suggestions as drafts, not gospel. You really need to apply scientific thinking to solving problems with AI. You can’t be the religious kind.

3. Inconsistent Patterns

Without clear constraints, the AI might use different patterns for similar problems across the codebase. Also if you switch between models, you get very different approaches. So bear that in mind when solving a problem in a session.

Solution: Establish patterns early and reference them explicitly in prompts. Copilot instructions are a good place for this.

4. Over-Engineering

Sometimes AI can suggest elaborate solutions for the simplest of problems. I don’t remember the exact details, but I have had moments where I had to prompt “why not xyz?”. And it would then consider the simpler alternative.

Solution: Explicitly request “the simplest solution that works” when appropriate.

The Future of AI-Assisted Development

I think we’re going through a fundamental shift in software development. I can imagine myself as an engineer today, taking the following approach when developing something.

Writing down ideas often helps us distil our thoughts better. This is exactly why I started blogging too. Thinking a feature through and writing it down, makes you question the idea yourself, you get take alternate routes but explore it properly, as if challenging yourself, then creating a product specification. This has greatly helped improve my perspective on the software that I was building
Through this process, you get to be the user and think about how as as a user you would have loved to do something, or wished if a certain thing could be accomplished with lesser clicks on the screen. The shift in perspective really matters because now you also know what you should be testing.
Not considering code a personal artefact is the biggest shift. I think engineers often get too attached to their code and become defensive to genuinely helpful comments that could improve their solution. But with AI assisted application development, there is less likelihood of this happening, as you are getting code generated by an AI agent. So you are more open to criticism and you see what you would otherwise have not.

It’s Not About Replacing Developers

AI won’t replace developers any more than compilers replaced assembly programmers. Instead, it’s raising the level of abstraction at which we work:

Then: We wrote assembly, now we write high-level languages
Now: We write code, future: we’ll orchestrate AI that writes code

It is bound to happen with every advancement. Higher level languages will emerge. Software development will feel more declarative.

It is About Amplification

The best analogy is power tools in construction:

A hammer doesn’t replace a carpenter
But a nail gun makes that carpenter 10x more productive

AI is the nail gun of software development. This is so true as nothing can beat the speed of AI code generation. So even if it isn’t the best code at first, the speed helps you iterate faster over solutions. And you get to literally apply the principle of Get it working, then get it right.

It Changes What’s Possible

With AI assistance, projects that seemed too ambitious for a solo developer become achievable. This is 100% true in my case. I may have been a full stack developer having experience with all parts of the stack, but I would never consider myself as a front-end developer. In my entire career, compared to the number of years I spent doing back-end development, front-end was less than 30% of it.

So when it came to developing the web version of CalOohPay, I wouldn’t have been able to deliver it in the little time I had, without AI’s help. The alternative would be to find a UI developer who is equally passionate about the problem to invest time outside of their day job, or perhaps hire someone full time.

The minimum viable team size is shrinking. One developer with AI assistance can accomplish what previously required a team. No offence to companies that still hire and maintain large teams. It is a genuine shift in thinking. Do you need all the developers focussing on the same problem? Could you augment the capacity using AI code generation? And perhaps divide teams further and solve more valuable problems?

I can only imagine the throughput gains if this is done properly.

But the foundations of it all, is trust. Trusting your developers to do the right thing with AI assisted development and not just sit around accepting changes.

Conclusion: The Craft of Software Engineering Evolves

Building CalOohPay with AI taught me that we’re not just gaining a new tool; we’re evolving our craft.

The essence of software engineering remains unchanged, when it comes to understanding problem, making thoughtful architectural decisions, writing clean maintainable code for humans, writing secure code, ultimately delivering value to your target users.

But the mechanics are transforming. AI handles the tedious, repetitive, and boilerplate work, freeing us to focus on the creative, strategic, and uniquely human aspects of engineering.

The result?

Faster delivery without sacrificing quality
More ambitious projects become feasible
More time for innovation and problem-solving

Making Generative AI an amplifier of your engineering capabilities, and elevating you do to higher-level thinking.

As I look at CalOohPay’s Web version today, I’m excited about what this means for the future of our craft.

The software craftsperson now wields Generative AI alongside other tools in the quest to build excellent software.

Interested in trying CalOohPay? Check out the CLI tool or web application. Both are open source and free to use for now.

Want to learn more about AI-assisted development? Follow along as I document more projects and lessons learned in this series.

The Beginning: A Problem#

The Original Vision: CLI First#

Enter GitHub Copilot with Claude Sonnet 4.5#

The Critical Success Factor: Clear Constraints#

1. Copilot Instructions#

2. Specific Prompts#

The Development Workflow: Human + AI#

Phase 1: Architecture and Planning (Human-Led)#

Phase 2: Implementation (Collaborative)#

Phase 3: Debugging (True Collaboration)#

Cloud Agents: Parallel Development#

The Results: Measuring Impact#

Speed#

Code Quality#

Feature Completeness#

The Deliverables: Real World Impact#

Lessons Learned: What Works#

1. Start with Strong Constraints#

2. Maintain Strategic Control#

3. Review Everything#

4. Iterate on Prompts#

5. Use the Right Tool for the Job#

Challenges and Limitations#

1. Context Window Limitations#

2. Hallucinations#

3. Inconsistent Patterns#

4. Over-Engineering#

The Future of AI-Assisted Development#

It’s Not About Replacing Developers#

It is About Amplification#

It Changes What’s Possible#

Conclusion: The Craft of Software Engineering Evolves#