How a CBSE Coaching Network Proved AI Grading Handles Every Problem OSM 2026 Has Exposed

10 min readBy Umang Agarwal
Stylized illustration for blog: How a CBSE Coaching Network Proved AI Grading Handles Every Problem OSM 2026 Has Exposed

TL;DR

In May 2026, CBSE's On-Screen Marking (OSM) rollout for Class 12 board exams generated a national controversy. Students reported blurry scans, missing pages, mismatched answer sheets, correct answers marked zero, and a re-evaluation portal that repeatedly crashed. The Union Education Minister flagged the issues. Class 12 pass percentage dropped from 88.39% (2025) to 85.20% (2026).

OSM is not AI grading — it is human marking on a digital portal. But the failures expose what any digital-evaluation system needs to get right.

Last month, before the OSM controversy went public, we ran a head-to-head study at a leading CBSE coaching network. We graded 588 student answers alongside their human teachers, then audited every grading decision. The audit happens to be a near-perfect map of the seven problems CBSE OSM is now grappling with — and the data shows what a system built to handle each one looks like.

Result
Human teacher accuracy 88.6%
IntelGrader effective accuracy 94.7%
Errors reduction 2.16× fewer mistakes

1. Why this case study matters in May 2026

Across India, students, parents, and coaching-network owners are asking the same question: can digital grading systems be trusted? The OSM controversy has not just damaged confidence in board evaluation — it has put the entire category of digital assessment under public scrutiny.

The honest answer depends on how a digital grading system is built:

  • OSM is a scan-and-display system. A human still does the grading on screen. Quality of marks depends on examiner attention, scan fidelity, and portal stability.
  • AI grading (what IntelGrader does) uses algorithms to do the actual grading. Image quality is checked before grading begins. Page completeness is verified at upload. Every grading decision has an audit trail.

The seven problems exposed in CBSE OSM 2026 each map to a specific design choice that AI grading systems can make differently. This case study walks through how IntelGrader handles each one, with data from a real coaching network's evaluation.

2. The setup

A leading CBSE coaching network ran their May Class 10 Mathematics monthly assessment as normal:

  • 326 students spread across 7 sections
  • A 20-mark paper covering Polynomials, Quadratic Equations, Trigonometry, and Applications of Trigonometry
  • 12 question parts, mix of basic and intermediate difficulty

The teachers graded the papers by hand using the standard rubric — this was their official assessment for the month. Then we asked IntelGrader to grade the same papers independently.

To make the test honest, we then picked the 49 scripts where the AI and the teachers disagreed most and audited every single grading decision on those scripts against the original student work. That is 588 grading decisions reviewed one by one.

This is the worst-case sample. If the AI was going to be wrong somewhere, it is here.

3. The headline result

Who Accuracy on 588 audited items
Human teachers 88.6%
IntelGrader (raw) 90.3%
IntelGrader (after fixing upload-quality issues) 94.7%

AI made 2.16× fewer true grading errors than the teachers.

The "after fixing upload-quality issues" line is important — it ties directly to the CBSE OSM blurry-scan problem, which we will return to in section 5.

4. What went wrong when humans graded

Of the 67 grading errors human teachers made:

What happened Times Share
Gave more marks than the rubric called for (often when the student almost showed the right working) 53 79%
Missed marks the student had genuinely earned 14 21%

The pattern is human leniency, not human carelessness. Teachers credit effort that does not quite meet the rubric. Across 326 students × 12 questions, that adds up. The lenient-marking rate also climbed in the back half of the grading batch — fatigue is real. The AI does not get tired and does not know who wrote the paper.

5. The seven OSM failure modes — and how IntelGrader handles each

This is the section that did not exist when we first wrote this case study in early May. In the weeks since, CBSE OSM 2026's failures have been documented by students, parents, education media, and the Union Education Ministry. Every failure maps to a design choice that AI grading systems can make differently.

Failure mode 1: Blurry / unreadable scanned answer sheets

What CBSE students reported: answer sheets whose handwriting was impossible to decipher — "even we can't read them" — yet were still marked.

What our audit found: of the 57 grading mistakes IntelGrader made on 588 items, 17 (30%) were caused by camera glare or cropped corners that made the answer hard to read.

How IntelGrader handles it: image quality detection runs before grading begins. Low-quality scans are flagged at upload, returned to the user with specific feedback (re-take with better lighting, recapture this corner), and only then proceed to grading. The 17 errors above came from an earlier version; that quality gate has shipped.

Failure mode 2: Missing pages

What CBSE students reported: entire pages skipped in evaluation; page-wise marks did not sum to the total.

What our audit found: 9 grading items (16% of AI errors) traced to a page that was not present in the upload.

How IntelGrader handles it: page-completeness check at upload. Expected page count is matched against detected pages. If a page is missing, submission is blocked with specific feedback ("page 3 not detected"). This fix is live in product.

Failure mode 3: Mismatched answer sheets (wrong student's pages)

What CBSE students reported: the Physics answer sheet displayed during re-evaluation did not belong to the student who requested it.

How IntelGrader handles it: roll number is bound to image at scan time. Each page captures the roll number from the header; mismatches between expected and detected roll numbers are blocked. Audit captured zero mismatches across all 588 grading items in our study.

Failure mode 4: Correct answers marked zero

What CBSE students reported: answers matching the official marking scheme but still receiving zero marks.

How IntelGrader handles it: every grading decision has an audit trail. The rubric line, the matched working, the marks awarded, and the error tags (if any) are recorded per question per student. A student or tutor can ask "why was this marked zero?" and get an answer in seconds, not weeks.

Failure mode 5: Unevaluated pages

What CBSE students reported: entire pages skipped — examiner did not score them at all.

How IntelGrader handles it: every expected grading decision is tracked. For 326 students × 12 questions, our audit confirmed 3,912 expected grading decisions, all attempted. Anything skipped would surface in the dashboard as "incomplete grading" — never silently zero.

Failure mode 6: Re-evaluation portal crashed for days

What CBSE students reported: login failures, payment gateway errors, multi-day outages during peak re-evaluation demand.

How IntelGrader handles it: modern stack on Supabase + Railway, serverless auto-scaling. No payment gateway in the workflow (re-grading is included). No re-evaluation portal in the traditional sense — every decision is queryable from day one, no separate retrieval flow.

Failure mode 7: Bright students with inexplicably low marks

What CBSE students reported: top-5%-of-cohort students receiving 70s when they expected 95s.

How IntelGrader handles it: per-student remediation report generated for every student. The report shows what concept each lost mark traces to, not just the final number. Score becomes traceable to learning gaps; surprise marks become diagnostics, not mysteries.

6. The deeper output — what IntelGrader produced beyond grading

The case study has more in it than the OSM-relevant pieces. Marking 326 papers accurately is necessary but not the headline. The four artifacts the AI produced after grading are.

6.1 Class-wide chapter weakness

Chapter Class accuracy Students affected
Introduction to Trigonometry 86.4% 74 / 302 (24%)
Quadratic Equations 76.3% 86 / 304 (28%)
Applications of Trigonometry 63.2% 262 / 306 (86%)
Polynomials 55.6% 299 / 306 (98%)

When 97.7% of the cohort misses the same chapter, that is a teaching-method gap, not a student-cohort gap. Worth re-doing the way this chapter is taught for the entire batch.

6.2 Section-by-section heatmap

% of students materially struggling, per chapter, per section:

Section Trig intro Polynomials Quad Eq Trig Apps Average
Section A 32% 100% 54% 92% 69.6%
Section B 13% 98% 30% 89% 57.4%
Section C 31% 95% 39% 95% 65.0%
Section D 39% 100% 18% 79% 58.8%
Section E 24% 98% 31% 91% 60.7%
Section F 22% 98% 16% 73% 52.2% (best)
Section G 16% 96% 17% 83% 53.0%

6.3 The three action types — and only three

RETEACH — re-explain the concept from scratch. Use when most students did not understand the underlying idea. Bring two worked examples and a short whiteboard recap.

DRILL — run a 5-minute formula-first drill before the next lesson. Use when students recognise the topic but apply the formula wrongly.

PRACTICE — assign 5–10 targeted practice problems for homework. Use when students get the concept and formula but slip on technique.

6.4 The 21 recommendations IntelGrader generated for this paper

Three priorities per section × 7 sections = 21 ranked actions. Six of seven sections share the same #1 problem — "finding height with two angles of elevation." That is a network-wide finding: one curriculum tweak, applied across every section, would lift the whole batch.

6.5 326 personalised practice papers

For every student, IntelGrader builds a custom 5-question practice paper from their actual answer-by-answer evidence. Easier-difficulty versions of the questions they lost the most marks on. No tutor team can do this manually at this scale.

7. Why this matters now

The OSM controversy has made one thing clear: digital evaluation is here to stay, but the bar for "trustworthy" has just been raised. Coaching networks that can demonstrate transparent, auditable, AI-powered grading — with proper image quality validation, per-student diagnostics, and an actual remediation output — will differentiate sharply from those relying on legacy systems.

This is not a CBSE problem. It is a category problem. Every coaching network, every school, every tutoring chain needs to answer the same question: how do you know your marking is accurate, and what are you doing with the result?

The case study above is one answer.

8. Methodology notes

  • Audit sample: the 49 scripts with the highest grading disagreement between the AI and the teachers. Worst-case selection.
  • Audit unit: 588 individual grading decisions, reviewed one by one against the original student work.
  • Anonymisation: coaching network name, section codes, and all student names have been removed. The numerical data is reported verbatim from the dashboard.

FAQ

How is AI grading different from CBSE's On-Screen Marking (OSM)?

OSM is human marking on a digital portal — the examiner still does the grading. AI grading uses algorithms to do the actual marking. Image quality, page completeness, and grading decisions are all algorithmic and auditable in AI grading; OSM relies on examiner attention plus scan fidelity.

How does IntelGrader handle blurry or unreadable scans?

Image quality is checked at upload, before grading begins. Low-quality scans are flagged with specific feedback (lighting, framing, missing corners) and returned to the user. Only after the image passes quality checks does it proceed to grading.

Is IntelGrader's accuracy really higher than experienced teachers'?

On 588 audited grading items at a CBSE coaching network, IntelGrader's effective accuracy was 94.7% vs 88.6% for the network's teachers — a 2.16× reduction in true grading errors. Methodology favoured the human side: we picked the worst-case scripts for AI.

Why do AI grading systems sometimes get it wrong?

The most common cause of AI "errors" is image quality, not algorithm. In our audit, 46% of IntelGrader's errors traced to camera glare, cropped corners, or missing pages — all operational issues fixed with better submission guidelines, not algorithmic failures.

What about CBSE coaching centres specifically?

Indian coaching centres face the same evaluation challenges as boards — at higher frequency. IntelGrader is designed for handwritten English and Hindi answer scripts, NEET/JEE/CBSE/ICSE-pattern marking schemes, and the weekly testing cadence Indian centres run on.

Try this on your students' data

If you run a coaching network or school and want to see how IntelGrader handles your specific evaluation workflow:

  • Book a demo: /book-demo
  • Run a free pilot: we will grade your next monthly assessment alongside your teachers at no cost. You see the dashboard, the per-section action plan, and the personalised practice papers — before any commitment.
UA
Umang Agarwal
Co-Founder at IntelGrader. Ex-P&G, IIM Calcutta. Focused on product and business development for AI-powered education tools.

Ready to transform your grading?

See how IntelGrader can save your tutoring centre 10+ hours per week with AI-powered grading.

Related Articles