# scHiCAR External Open-Discovery Graph Report

## Purpose

This report accompanies:

- `schicar_external_open_discovery.graph.json`
- `schicar_external_open_discovery.html`

The goal is to test a Graphify-style output pattern for this project:

```text
graph.json + graph.html + report.md
```

The JSON is the machine-readable source of truth. The HTML is the dynamic network viewer. This report is the human-readable audit layer.

## Source Inputs

- Seed graph: `schicar_dataset_network_draft.md`
- Open-discovery validation: `validation/add-schicar-external-nodes_open_discovery.md`
- scHiCAR extraction validation: `validation/extract-paper-dataset-network_schicar.md`
- Takei 2025 extraction validation: `validation/extract-paper-dataset-network_takei2025.md`

## Evidence Levels

| Evidence level | Meaning in this prototype |
|---|---|
| `EXTRACTED` | Directly extracted from a paper, repository, PubMed metadata, GEO/Zenodo metadata, or an existing validation report. |
| `INFERRED` | Compatibility is inferred from extracted metadata, such as organism, tissue, modality, feature space, and integration task. |
| `AMBIGUOUS` | Candidate or relationship is plausible but requires additional evidence before being treated as an external node or strong compatibility edge. |

## Included External Nodes

| Node | Why included | Evidence level |
|---|---|---|
| `AllenWMB.MERFISH` | Strong spatial anchor candidate for scHiCAR brain RNA-derived states. | `EXTRACTED` node; `INFERRED` compatibility |
| `BICCN_MOp.scRNA` | Strong cortical label-transfer reference. | `EXTRACTED` node; `INFERRED` compatibility |
| `BICCN_MOp.scATAC` | Strong cortical regulatory-reference candidate. | `EXTRACTED` node; `INFERRED` compatibility |
| `Takei2025.RNAseqFISH` | Medium-strength cross-region brain spatial transcriptome reference. | `EXTRACTED` node; `INFERRED` compatibility |
| `Takei2025.DNAseqFISH` | Medium-strength imaging-derived 3D genome reference. | `EXTRACTED` node; `INFERRED` compatibility |

## Candidate Nodes Requiring More Evidence

| Node | Current status | Why not promoted yet |
|---|---|---|
| `MuscleRegen.scRNA` | `needs_accession_check` | PubMed search shows a real candidate space, but accession and actual file formats were not verified. |
| `MuscleRegen.scATAC` | `needs_accession_check` | The task fit is strong, but the existence of suitable scATAC or multiome data still needs verification. |
| `sc3DGenome.RNA3D` | `weak_or_method_reference` | Modality fit is relevant, but tissue and biological context are not yet verified for direct integration. |
| `NeuronHiC.megadomain` | `weak_or_contact_reference` | Potential scHi-C reference, but observation level, resolution and tissue context need checking. |

## Excluded Candidate

| Dataset | Reason |
|---|---|
| scHiCAR | It is the seed dataset: DOI `10.1038/s41587-026-03013-7`, GEO `GSE305889`. |

## How To Read The Dynamic Viewer

- GitHub's file browser shows `.html` files as source code. Preview the dynamic viewer locally with `python3 -m http.server 8765`.
- Solid edges are seed-internal scHiCAR modality relationships.
- Dashed edges are compatibility or reference relationships to external datasets.
- Strong, medium, weak and candidate edges are visually distinct.
- Clicking a node or edge shows metadata, evidence level, source, and enabled question when available.

## Current Limitations

- This is a prototype output, not a completed systematic review.
- The dynamic viewer is intentionally lightweight and does not yet provide graph queries.
- The JSON schema is informal; a later branch can convert it to LinkML after the structure stabilizes.
- Candidate muscle regeneration datasets need a focused accession and format verification pass.
