Available for new projects

Mark Sweha

Medical Doctor · AI Operations Specialist · Theological Essayist

A hybrid practitioner combining bedside clinical training, AI evaluation operations, and healthcare education — graduate of Zagazig Faculty of Medicine and published systematic reviewer turning clinical expertise into defensible data that powers safer medical AI. Also writing long-form essays on theology, philosophy & depth psychology.

Remote — Egypt•Native Arabic · English C1+

See My Services Send a sample task

About Me

Where clinical depth meets precision-engineered AI evaluation.

I'm a graduate of Zagazig Faculty of Medicine, a published systematic reviewer, and an AI training data specialist working at the intersection of clinical medicine, payer policy, and large language models. My evaluations don't just check whether an answer is "correct" — they assess whether it is legally defensible, clinically safe, and ready for real-world deployment.

I've evaluated medical prior-authorization AI responses involving oncology pharmacotherapy, molecular mutation analysis, and step-therapy compliance — led QA on Arabic LLM training data, audio transcription batches, and a 100-item SEC financial Q&A annotation pipeline I designed end-to-end — and serve as International Affairs Strategic Lead at GAIA, an Egyptian agri-food startup.

95%+

Evaluation accuracy

MB BCh

Zagazig Med graduate

Sources synthesized (SR)

Ar / En

Bilingual capability

Download Resume

Physician-Level Review

Graduate of Zagazig Faculty of Medicine with clinical rotations across internal medicine, surgery, pediatrics, and psychiatry. I evaluate AI outputs the way a Medical Director of Pharmacy Operations would — cross-referencing payer policies, NGS reports, and specialist notes in real time.

Clinical & Medical Research

Hands-on clinical research across behavioral medicine, personality disorders, developmental adversity, and psychometric methodology. Comfortable moving between bedside reasoning, literature synthesis, and quantitative analysis.

Elsevier Peer Reviewer & Systematic Reviewer

Invited peer reviewer for Personality and Individual Differences (Elsevier) — review accepted and recognized with an Elsevier Certificate of Reviewing (April 2026). Co-author on a PRISMA systematic review & meta-analysis on childhood emotional adversity and vulnerable narcissism — 51 studies synthesized, pooled r = 0.42, I² = 86%, Egger's test p = 0.28. Manuscript PAID-D-26-00409R1 under review at Elsevier.

AI Training & Annotation

Multi-dimensional rubrics for accuracy, reasoning, completeness, and safety. Arabic and English LLM evaluation across formal, neutral, and conversational registers.

QA Pipelines That Hold Up

Style guides, error taxonomies, rater screening tests, and rolling QA cycles — designed so every dataset entry is independently verifiable, machine-parseable, and defensible on appeal.

International Affairs — GAIA

International Affairs Strategic Lead at GAIA, an Egyptian agri-food startup. Built the full international communications package: intelligence brief, investor narrative one-pager, WFF Startup Innovation Awards 2026 application, and an Arabic-language patent protection proposal.

Theological & Philosophical Essayist

Independent long-form writing at the intersection of Patristic theology, existential philosophy, and depth psychology. Themes include the divided self, narcissism as flight from being, the structure of love, and a liturgical-psychological reading of Holy Week. Published at Divine Philosophy.

Services

What I deliver

Specialized services for AI labs, healthcare-AI startups, and annotation vendors that need clinically defensible training data and evaluations.

Medical LLM Evaluation

Physician-level review of LLM responses against payer policy, clinical guidelines, NGS reports, and specialist notes.

Clinical Hallucination Detection

Catch fabricated drug indications, wrong dosing, false guideline citations, and unsafe clinical reasoning before they ship.

Arabic LLM QA

MSA + Egyptian colloquial. Formal, neutral, and conversational registers — with structured correction pathways.

Medical Transcription QA

4-tier error taxonomy (Minor / Moderate / Significant / Critical) with timestamped citations and word-level accuracy scoring.

Annotation Guidelines

Style guides, edge-case libraries, validation rules, and rater onboarding workflows that hold up under audit.

Rubric Design

Multi-dimensional rubrics calibrated for inter-rater reliability — built around accuracy, reasoning, completeness, and patient safety.

Need medical AI outputs reviewed for clinical safety, hallucinations, or policy compliance?

Send me a sample task. I'll return a physician-level evaluation with a clear rubric, error taxonomy, and revision pathway.

Send a sample task →

Expertise

Skills & Capabilities

Cross-disciplinary competencies that let me operate at physician-level standard on AI training data, evaluation, and pipeline design.

Medical AI Evaluation

Prior Authorization ReviewClinical Reasoning AssessmentPharmacology & OncologyMolecular Genetics (NGS)Payer Policy InterpretationPatient-Safety Flagging

LLM QA & Annotation

Multi-dimensional RubricsRLHF Preference RankingHallucination DetectionError Taxonomy DesignInter-Rater ReliabilityEdge-Case Analysis

Arabic Language QA

MSA + Egyptian ColloquialFormal / Neutral / Conversational RegistersLinguistic ConsistencyStylistic AlignmentMedical Translation

Project & Pipeline Design

Annotation Pipeline ArchitectureStyle Guide AuthoringRater Screening (30-pt Rubric)Rolling QA CyclesValidation Rule SetsOnboarding Workflows

Clinical & Medical Research

PRISMA Systematic ReviewRandom-Effects Meta-AnalysisROBINS-E Risk of BiasPsychometric MethodologyBehavioral MedicineCritical Appraisal (PubMed / Cochrane)Manuscript Drafting (Elsevier)Clinical Literature Synthesis

Clinical Medicine

Internal MedicinePediatricsSurgeryPsychiatryOncology / HematologyPharmacologyPatient Education

Strategy & International Affairs

GAIA — Int'l Affairs LeadInvestor NarrativesGrant & Awards Writing (WFF 2026)Stakeholder CommunicationsBilingual Strategic WritingPatent Proposal Drafting

Tools & Platforms

AlignerrUpworkGoogle Sheets / ExcelNotionLabel StudioChatGPT / Claude / Gemini

Featured Work

Selected Projects

Real evaluation samples and pipeline designs from medical AI, Arabic LLM QA, audio transcription, and academic research.

Medical AI · Oncology

Medical Prior-Authorization AI Evaluation

Evaluated AI-generated prior-authorization determinations for a CML / Bosutinib case at Medical Director of Pharmacy Operations level. Cross-referenced payer policy bulletin (OHS-CML-2024-004B), NGS reports (T315I mutation, VAF 18.4%), bone marrow biopsy, and 15+ specialist notes. Identified a prescriber clinical error on Bosutinib efficacy for T315I that the weaker AI response missed entirely — and chose the response that built a legally defensible denial via verbatim policy citation and a proactive Clinical Alert.

A+++ preference call15+ source documentsVerbatim citation standard

Hematology/OncologyTKI PharmacotherapyStep-Therapy PolicyRLHF

View sample

Project Design · 100 items

SEC Financial Q&A Dataset — Pipeline Designer & QA Lead

Designed a complete annotation pipeline for a 100-item AI training dataset of SEC 10-K / 10-Q financial Q&A pairs. Built the 3-tier category framework (A: single-fact, B: multi-fact synthesis, C: cross-document comparison), verbatim citation standard with mandatory page numbers, 30-point rater screening rubric, rolling QA cycle, and automated validation rules — all so every entry is independently verifiable and machine-parseable from day one.

100 Q&A pairs30-point screening≥95% target accuracy

Pipeline DesignStyle GuidesRater OnboardingValidation Rules

View sample

Audio QA

Audio Transcription QA — 4-Tier Error System

Quality-reviewed a 24-file audio transcription batch (3h 42m, conversational + technical English) using a 4-tier error classification system (Minor / Moderate / Significant / Critical) with explicit accept, revise, and reject thresholds. Produced file-level accuracy scores, timestamped error citations, and targeted revision instructions — maintaining 96.8% batch accuracy, exceeding the 95% project threshold.

96.8% batch accuracy24 files reviewed4-tier error taxonomy

Transcription QAError ClassificationWord-Level Accuracy

View sample

Arabic NLP · LLM

Arabic LLM Response Evaluation

Evaluated AI-generated Arabic content across formal, neutral, and conversational registers — including MSA and Egyptian colloquial. Flagged factual errors, grammatical inconsistencies, and stylistic misalignments, producing structured written feedback with specific correction pathways for data contributors.

95%+ accuracyAll registersBilingual feedback

Arabic NLPRegister AnalysisLinguistic QA

Peer Review · Elsevier

Elsevier Peer Reviewer — Personality and Individual Differences

Invited peer reviewer for Personality and Individual Differences (Elsevier, IF 3.7). Completed a manuscript review in April 2026 — review accepted by the editorial board and formally recognized with an Elsevier Certificate of Reviewing. Applied the same methodological discipline I bring to AI evaluation: source verification, methodological appraisal, and defensible written feedback.

Certificate of ReviewingApril 2026Elsevier journal (IF 3.7)

Peer ReviewMethodological AppraisalEditorial FeedbackPersonality Psychology

Research · Elsevier

Systematic Review & Meta-Analysis — Elsevier

Co-authored a PRISMA-compliant systematic review on childhood emotional adversity and vulnerable narcissism, currently under review at Personality and Individual Differences (Elsevier, PAID-D-26-00409R1). Led full-text screening, dual extraction, ROBINS-E risk-of-bias assessment, random-effects meta-analysis, and manuscript drafting. Synthesized 51 studies — pooled r = 0.42, I² = 86%, Egger's test p = 0.28.

51 studiesPooled r = 0.42Under review at Elsevier

PRISMARandom-Effects Meta-AnalysisROBINS-EManuscript Drafting

Clinical Research

Clinical Research — Behavioral Medicine

Ongoing literature reviews and research synthesis at the intersection of clinical medicine and psychology — covering personality disorders, developmental adversity, psychometric methodology, and behavioral medicine. Comfortable with PubMed search strategy, critical appraisal, and translating findings for both academic and clinical audiences.

3 active research domainsBilingual literaturePubMed / Cochrane workflow

Literature ReviewCritical AppraisalPsychometricsClinical Synthesis

Strategy · GAIA

GAIA — International Affairs Strategic Lead

Built the full international communications package for GAIA, an Egyptian agri-food startup: country/market intelligence brief, investor narrative one-pager, WFF Startup Innovation Awards 2026 application materials, and an Arabic-language patent protection proposal. Translate complex scientific and business concepts into clear narratives for international funders, partners, and stakeholders.

WFF 2026 applicationInvestor one-pagerPatent proposal (AR)

Strategic WritingGrant / AwardsStakeholder CommsBilingual Narrative

Methodology

Medical LLM Evaluation Rubric

Authored a 6-dimension rubric for evaluating AI responses in medical utilization review: Policy Precision, Evidence Specificity, Error Detection, Completeness, Actionability, and Patient-Safety Priority — drawn from real-world PA adjudication standards and payer policy requirements.

6 quality dimensionsReal-world standardReusable framework

Rubric DesignQuality StandardsDocumentation

Research & Peer Review

Elsevier peer reviewer & published systematic reviewer.

My research practice is the same engine behind my AI evaluation work: disciplined sourcing, methodological rigor, and the ability to defend every claim against the next reviewer in line.

April 2026

Elsevier Certificate of Reviewing — Personality and Individual Differences, awarded to Mark Sweha

Certificate of Reviewing — Elsevier

Personality and Individual Differences · awarded for a review contributed to the journal in April 2026.

About the journal

Elsevier Peer Reviewer

Personality and Individual Differences

Invited reviewer for Personality and Individual Differences (Elsevier, IF 3.7). Completed a peer review in April 2026 — review accepted by the editorial board and formally recognized with an Elsevier Certificate of Reviewing.

Published Systematic Reviewer

PRISMA · Meta-Analysis

Co-authored a PRISMA-compliant systematic review and random-effects meta-analysis on childhood emotional adversity and vulnerable narcissism — 51 studies synthesized, pooled r = 0.42, I² = 86%, Egger's test p = 0.28. Manuscript PAID-D-26-00409R1 under review at Elsevier.

Clinical Research Practice

Behavioral Medicine · Psychometrics

Active research across personality disorders, developmental adversity, and psychometric methodology. Comfortable with PubMed search strategy, ROBINS-E risk-of-bias appraisal, dual extraction, and translating findings for clinical and academic audiences.

Methodological Range

Quant + Qual

From bedside reasoning to literature synthesis to quantitative pooling — I move fluidly between clinical case formulation, critical appraisal, and statistical interpretation. Bilingual literature workflow (Arabic & English).

Writing

Also a theological & philosophical essayist.

Outside the clinic and the evaluation queue, I write long-form essays at the intersection of Patristic theology, existential philosophy, and depth psychology. The essays read Scripture, the Church Fathers, and the human person as one continuous text — asking how shame, love, narcissism, and resurrection actually work inside a real life.

Published in Arabic at Divine Philosophy.

Patristic Roots

Rooted in the Alexandrian and Cappadocian fathers — Clement, Origen, Gregory of Nyssa, and Maximus the Confessor — recovering the mystical-philosophical line that often gets lost behind later polemics.

Existential Register

Theology read through being, freedom, shame, and the divided self. Less doctrinal exposition, more an attempt to ask what it actually means to exist before God and before another person.

Depth-Psychological Lens

Narcissism as flight from being, the self-image formed in childhood, the mechanics of love and rejection. The clinical eye and the contemplative eye reading the same human at the same time.

Conversational Arabic Prose

Long-form essays in colloquial-leaning Arabic — written as if thinking out loud with the reader. Slow rhythm, parenthetical asides, and a refusal to flatten complexity into slogans.

Featured Essays

View all essays

Liturgical Reading April 7, 2026

اثنين البصخة

قراءة ليتورجية لأسبوع البصخة — كيف ربطت الكنيسة بين خلق العالم، الكرم في نشيد الأنشاد، وشجرة التين الملعونة. الموضوع كله: الخزي من الذات، ومحاولة تغطية العُري بورق التين بدلاً من الشركة مع الآخر.

Read on Divine Philosophy

Patristics & Psychology November 29, 2025

هل نفهم الحب حقًا؟ — (١) النرجسية

النرجسية في جوهرها ليست حُبًّا للذات، بل خوفٌ منها. خوفٌ من الجلوس مع النفس وجهًا لوجه، خشية الوصول إلى الفراغ. قراءة جديدة لأسطورة نرسيس بعيدًا عن التفسير الشائع.

Read on Divine Philosophy

Existential Theology November 15, 2025

شهوة الوجود

كل شهوة في حقيقتها اشتهاءٌ للوجود — لذلك البعيد عنّي الذي لا أملكه. كراهية الوجود كراهيةٌ للذات فيه. وأول خطوة في الحلّ هي التجسّد: أن يقبل الإنسان وجوده كما هو.

Read on Divine Philosophy

Patristic Theology October 30, 2025

الموات (١)

الموات ليس حركة بيولوجية. ولن تُدرك ذلك إلا حين تُحب. عودة إلى تقليد الآباء الفلاسفة — كليمندس، أوريجانوس، النيصي، ومكسيموس المعترف — والطريق المستيكي.

Read on Divine Philosophy

Browse all essays Visit Divine Philosophy

Contact

Let's build safer medical AI together.

Need medical AI outputs reviewed for clinical safety, hallucinations, or policy compliance? Send me a sample task — I'll return a physician-level evaluation with a clear rubric and revision pathway.

Send me a sample task

mark.magdy.amir@gmail.com