The 7 Things That Went Wrong With CBSE OSM in 2026 — and How a Properly-Built AI Grading System Handles Each One

TL;DR
CBSE's 2026 OSM rollout exposed seven specific failure modes — each documented by students, parents, and education media over May 2026. Each failure traces to a design choice an evaluation system can make. This post walks through each of the seven, what it looks like to students, and how a properly-built AI grading system avoids it by design. The "AI grading does it differently" claims are backed by an audit of 588 grading decisions at a CBSE coaching network, documented in our case study.
Setting expectations
To be clear: this is not a takedown of CBSE. OSM is a digital interface for human examiners. The failure modes below trace primarily to design choices around image quality, completeness verification, and audit trails — choices that can be made differently in any digital evaluation system. The point of this post is to show what "made differently" looks like, with data.
Failure mode 1: Blurry / unreadable scanned answer sheets
What students reported in 2026: Students applied for photocopies of their answer scripts under re-evaluation. The scanned copies that came back were, in many cases, effectively unreadable — handwriting blurred, contrast too low, corners cropped. Parents publicly asked the obvious question: how can an examiner accurately grade what they cannot read?
What this looks like in practice: The examiner sees the same blurry image the student later sees. Faced with hundreds of such scripts on a deadline, the human reaction is to guess at unclear answers or skip ambiguous sections. Marks get awarded conservatively or randomly.
How a properly-built AI grading system handles this: Image quality validation runs before any grading begins. A scan is rejected at upload if:
- Average sharpness falls below a threshold
- Contrast is too low for OCR to be reliable
- Corners are missing (cropped frame)
- Glare obscures more than a small percentage of the writable area
The system returns specific feedback: "page 2 needs to be re-captured — too much glare in the top-right quadrant." The student or invigilator re-scans before the paper enters grading. No grading happens on an image the system itself cannot read.
Evidence from our case study: Of the 57 grading mistakes IntelGrader made on 588 audited items, 17 (30%) were caused by image quality issues — camera glare, cropped corners. After shipping the image quality gate, these are caught at upload rather than at grading. The effective accuracy figure of 94.7% removes these from the algorithmic error rate.
Failure mode 2: Missing pages
What students reported in 2026: Page-wise marks in scanned re-evaluation copies did not always sum to the final total. Students who counted their answer-sheet pages physically (e.g. 14 written pages) received scans of fewer pages (e.g. 12). The missing pages appeared to have been simply not scanned.
How a properly-built AI grading system handles this: Page-completeness verification at upload. The system knows how many pages the paper template expects. Every uploaded set is matched against that count. If a page is missing, submission is blocked with specific feedback ("expected 14 pages, got 12 — page 3 and page 9 missing"). The student or staff must re-capture before submission proceeds.
Evidence from our case study: Page-missed issues accounted for 9 errors (16% of AI errors) in our 588-item audit. After shipping the completeness check, these go to zero in subsequent batches.
Failure mode 3: Mismatched answer sheets
What students reported in 2026: In one widely-shared case, a Class 12 student requested their Physics answer sheet via re-evaluation and received pages from someone else's paper. This raised concerns about the script-to-student binding in OSM.
How a properly-built AI grading system handles this: Roll number is bound to the image at scan time. Each scanned page captures the roll number from the answer-sheet header. The system checks that:
- All pages of a script share the same roll number
- The roll number on the script matches the roll number assigned by the centre
- No page is bound to two different students
If any of these fail, the script is flagged for manual inspection before grading begins.
Evidence from our case study: Zero mismatches were recorded across all 588 grading items in our audit. The binding is verified at upload, not at retrieval.
Failure mode 4: Correct answers marked zero
What students reported in 2026: Students compared their working to the official marking scheme and found matches — yet received zero marks. Without an audit trail per grading decision, there is no way to know if this was examiner error, OSM workflow error, or scan corruption.
How a properly-built AI grading system handles this: Every grading decision has an audit trail. For each question, each student, the system stores:
- The rubric line(s) the answer was matched against
- The detected working (transcribed from handwriting)
- The marks awarded with explanation
- Error tags if marks were lost (wrong setup, missing step, etc.)
A student or tutor asking "why was this marked zero?" gets an answer in seconds — not weeks via a re-evaluation portal. If the algorithm got it wrong, the audit trail surfaces the disagreement immediately.
Failure mode 5: Unevaluated pages
What students reported in 2026: Some students reported entire pages with no grading marks at all — the examiner appears to have skipped them.
How a properly-built AI grading system handles this: Every expected grading decision is tracked. If the paper has 12 question parts, the system expects 12 grading decisions per student. Anything skipped surfaces in the dashboard as "incomplete grading" — it cannot silently default to zero.
Evidence from our case study: For 326 students × 12 questions = 3,912 expected grading decisions, the audit confirmed all were attempted. No silent skips.
Failure mode 6: Re-evaluation portal crashed for days
What students reported in 2026: CBSE's re-evaluation portal went down for multiple days during peak demand. Students reported login failures, payment gateway errors, and lost submissions.
How a properly-built AI grading system handles this: This is a different category of issue — infrastructure rather than grading logic. But the design choice matters:
- Modern serverless infrastructure auto-scales with demand
- Decisions are queryable from day one (no "re-evaluation portal" separately gated)
- Payment is decoupled from grading queries (re-grading is included in the platform, not an additional fee)
The absence of a separate re-evaluation portal removes an entire failure mode.
Failure mode 7: Bright students with inexplicably low marks
What students reported in 2026: Top-of-cohort students who anticipated 95+ scores received 70s — and could not understand why. Without a per-question diagnostic, the score itself becomes the only signal, and an unexpected score becomes a mystery rather than a diagnostic.
How a properly-built AI grading system handles this: Per-student remediation reports generated for every student. The report shows what concept each lost mark traces to, not just the final number. A "bright student" who unexpectedly loses marks gets a clear explanation of where their gap is — turning a score surprise into a teaching opportunity.
Evidence from our case study: Every one of 326 students received a personalised remediation report with per-question concept tags. No score arrived without explanation.
The pattern across all seven failure modes
Five of the seven failures trace to a single underlying issue: insufficient validation at the input stage. If image quality, page completeness, and answer-sheet binding are validated before grading begins, most of the downstream failures cannot occur. The two remaining failures (portal infrastructure and missing diagnostics) are equally addressable through modern infrastructure and per-student reporting.
This is the design difference between OSM and a purpose-built AI grading system — not the marking technology, but the validation and infrastructure decisions around it.
What this means for coaching networks watching this unfold
For coaching networks watching the OSM controversy, the practical question is: how do you give parents and students confidence in your own marking? The answers come from the same toolkit:
- Validate scans at upload, not after grading
- Verify page completeness before submission
- Maintain audit trails per grading decision
- Generate diagnostic reports, not just scores
- Use infrastructure that scales with peak demand
We have written more on what coaching centres can do specifically in this companion post.
FAQ
Are all seven OSM failure modes equally severe?
No. Blurry scans, mismatched answer sheets, and inexplicably low marks have generated the most public attention. Missing pages and unevaluated pages are individually less common but harder to detect without a per-question audit trail.
Could CBSE fix these issues in OSM without switching to AI grading?
Many of them, yes. Image quality checks, page completeness validation, and roll-number-to-page binding can all be added to OSM as workflow upgrades. They have not been to date in the public version.
Does IntelGrader work at board-exam scale?
IntelGrader is currently optimised for coaching centres, schools, and tutoring chains running weekly formative testing. Board-exam-scale deployments (50+ lakh papers per cycle) are not yet a use case we serve. The design principles described above apply at any scale, however.
How accurate is AI grading in the wild?
On 588 audited grading items at a CBSE coaching network, IntelGrader's effective accuracy was 94.7% vs 88.6% for the network's teachers. The methodology was conservative — we picked the worst-case scripts for AI. Full case study: here.
What's the realistic adoption path for coaching centres?
Most centres start with one batch, one subject, run AI grading in parallel with manual grading for two weeks, then decide. The 94.7% accuracy is not the main reason centres adopt — it is the per-student remediation output that no manual workflow produces at scale.
Related reading
Ready to transform your grading?
See how IntelGrader can save your tutoring centre 10+ hours per week with AI-powered grading.



