### Localities sampled

Ten localities on Gotland, Sweden, were sampled and processed for ophiuroid microfossils (Fig. 1; Supplementary Figs 1, 2).

Ireviken30,31: Llandovery and Wenlock series, Telychian and Sheinwoodian stages, Pterospathodus amorphognathoides through Upper Kockellela ranuliformis conodont zones, Lower Visby Formation through Högklint Formation (unit a). Ophiuroid sample is from the middle–upper portion of the Lower Visby Formation.

Lickershamn30,31: Wenlock Series, Sheinwoodian Stage, Lower Pterospathodus procerus through Upper Kockellela ranuliformis conodont zones, Lower Visby Formation (unit e) through Högklint Formation (unit b). Ophiuroid sample is from the middle part of the Upper Visby Formation.

Brissund32,33: Wenlock Series, Sheinwoodian Stage, upper Kockellela ranuliformis conodont Zone, Högklint Formation “undifferentiated”.

Tjälderviken (see Tjeldersholm for locality references)30,31: Wenlock Series, Whitwell Stage, Ozarkodina sagitta sagitta conodont Zone, Slite Group (upper part—“Pentamerus gothlandicus beds” or equivalents).

Svarvare30,31: Wenlock Series, Whitwell Stage, Ozarkodina bohemica longa conodont Zone, Slite Group, Fröjel Formation, Gannarve or Svarvare member.

Mulde Tegelbruk30,31: Wenlock Series, Gleedon Stage, Ozarkodina bohemica longa conodont Zone, Halla Formation, Mulde Brick clay member, Gothograptus nassa-Pristioprion dubius graptolite Interregnum, Mulde Event, fauna 4.

Gannor30,31: Ludlow Series, Ludfordian Stage, uppermost part of the Polygnathoides siluricus conodont Zone or lowermost part of the Lower Icriodontid conodont subzone sensu Jeppsson34, Hemse Group, När Formation, Botvide Member or the lowermost Eke Formation.

Lau backar38,39: Ludlow Series, Ludfordian Stage, uppermost part of the Lower Icriodontid conodont subzone sensu Jeppsson34, Eke Formation.

Petsarve30,31: Ludlow Series, Ludfordian Stage, Upper Icriodontid conodont subzone sensu Jeppsson34, Eke Formation, upper part.

Hoburgen30,31,35: The Hoburgen area comprises several mapped geological localities spanning the Ludlow Series, Ludfordian Stage, Ozarkodiana snajdri through O. crispa conodont zones, uppermost Burgsvik Formation through the Sundre Formation. The ophiuroid sample is from weathered material within a cave adjacent to the so-called Hoburgsgubben (“Hoburg man”) sea stack; Hamra or Sundre Formation.

In all the samples, the ophiuroid material was fully disarticulated, with very few exceptions from the Mulde Tegelbruk sample (Fig. 1, Supplementary Fig. 4). Furthermore, we assessed the existence of pre-burial sorting in the samples by semi-quantitative examination of the plate type proportions for various asterozoan and pelmatozoan echinoderm groups. In all cases, relative abundances of skeletal elements reflected their expected anatomical frequency irrespective of their hydrodynamic properties (shape, size). The plate types that are the most abundant in the skeleton of a given echinoderm taxon were also the most abundant in the samples, and no particular plate type was over- or underrepresented with respect to their anatomical frequency. We therefore exclude the possibility that the studies skeletal plates underwent pre-burial sorting.

### Geological context and event stratigraphy

The island of Gotland, Sweden, exposes stacked generations of carbonate platforms formed in shallow marine settings along the northwestern margin of the intracratonic Baltic basin during the Silurian Period12,36,37. The rock succession is characterized by a nearly complete absence of tectonic overprinting and only minor dolomitization occurs. These characteristics combined with a palaeolatitudinal setting of the island of c. 20° south of the equator, have resulted in rich, diverse and exceptionally preserved tropical fossil faunas. The entire succession is sub-divided into a number of formations and groups, ranging from the upper Llandovery Lower Visby Formation through the upper Ludlow Sundre Formation (Fig. 1; Supplementary Fig. 2)22,36. The facies belts change along outcrop strike, from proximally formed limestones in the northeast to sparsely graptolitic, more distally formed marls in the southwest. The stratigraphic completeness combined with excellent fossil preservation, and the long history of geological investigations has produced a very high-resolution biostratigraphic framework, primarily based on conodonts22.

During the last few decades, the Silurian Period has emerged as a highly dynamic and climatically unstable time during Earth History, encompassing a whole suite of biogeochemical events38,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40. In the Gotland succession alone, high-resolution studies primarily based on conodonts enabled the identification of up to eight such events; the most prominent ones being the Ireviken Event, Mulde Event, and Lau Event (Fig. 1)12,22,23,41,42. These latter events, characterized by rapid and profound shifts in the oxygen and carbon isotope signatures, as well as faunal and sedimentary facies restructurings, have been shown to have a very widespread, if not global, geographic occurrence22,23,39.

The Homerian (middle Silurian) Mulde Event (Fig. 4), which is of particular interest in this study, was first identified on Gotland42, based on conodont fauna re-organizations. However, prior to this a major Homerian extinction event among graptolites had been noted in other parts of the world, with as much as 95% of all species going extinct in some regions11,39. Therefore, parts of the Mulde Event are known also under different (“graptolite-based”) names, such as the “lundgreni Event” and the “Big Crisis”12, the latter being a term recently used “to refer to the sum total of all global change during this interval, not only the event as it pertains to graptolites”11.

On Gotland, the Mulde Event begins at the base of the Fröjel Formation and ends at the top of the Halla Formation (Fig. 4). In addition to stepwise extinctions of graptolites and conodonts (as well as other taxa that have not been as intensely studied) during three datum points (Datum 1. 1.5. and 2), the Mulde Event is characterized by sedimentary changes and major stable isotope perturbations recorded on widely separated continents39. The triggering mechanism/s of the event have been debated, but are generally thought to be linked to the relationship between, and changes in, ocean circulation, primary production, and the global carbon cycle23,39,41,43.

### Sample treatment

Bulk sediment samples taken by one of us (M.K.) were screen-washed using tap water. Ophiuroid microfossils were picked from dried sieving residues under a dissecting microscope. Selected specimens were cleaned in an ultrasonic bath, mounted on aluminum stubs and gold-coated for scanning electron microscopy (SEM) using a Jeol Neoscope JCM-5000. Skeletal plates of the recent ophiuroid Ophiopholis aculeata used for comparison with the Gotland ophiuroids were extracted from an articulated individual using household bleach, rinsed in tap water and mounted for SEM8. All figured specimens are deposited in the collections of the Natural History Museum Luxembourg (MnhnL).

### Nomenclatural Acts

This published work and the nomenclatural acts it contains have been registered in ZooBank, the proposed online registration system for the International Code of Zoological Nomenclature (ICZN). The ZooBank LSIDs (Life Science Identifiers) can be resolved and the associated information viewed through any standard web browser by appending the LSID to the prefix “http://zoobank.org/”. The LSIDs for this publication are: urn:lsid:zoobank.org:pub:0E3C9E87-C337-4509-B02C-6053492B814B, for the publication, urn:lsid:zoobank.org:act:E378D576-4B79-45CC-A9B3-C46999D1A753 for Muldaster, urn:lsid:zoobank.org:act:01ED8A54-507D-4709-8E38-E841C021F879 for M. haakei, urn:lsid:zoobank.org:act:395AAF1A-087F-4897-9B2C-CE0132284EDC for Ophiopetagno, and urn:lsid:zoobank.org:act:A705CAD9-B96B-448C-A0C7-5C293D03ABB5 for O. paicei.

### Phylogenetic analysis

#### Bayesian inference analysis

The phylogenetic position of Ophiopetagno paicei and Muldaster haakei was evaluated with Bayesian-inference analyses applied to a data matrix of 25 taxa and 68 characters (Supplementary Methods, Supplementary Data 1). The set of taxa includes representatives of all major Paleozoic ophiuroid groups, six living species representing the three major extant ophiuroid clades, and the best known Paleozoic modern-type ophiuroid species. The outgroup taxon was chosen because the Stenurida are widely accepted as sister to the Ophiuroidea, irrespective of whether they represent a class of their own or a paraphyletic complex at the base of the Ophiuroidea14. Pradesura jacobi was chosen as one of the best known stenurid taxa.

All characters were unordered in the absence of evidence for a clear ontogenetic or size-related progression between character states. Scoring was carried out independently by two of us (BT and LNT) and then compared and checked for consistency. The only inconsistency pertained to character 48, resulting from an uncertain interpretation of the presence or absence of a constriction in the lateral arm plates of some Paleozoic taxa. Consistency of scoring was restored through a case-by-case revision of the concerned taxa, rigorously applying the definition of constriction in lateral arm plates as previously provided7,8. When character states could not be assessed due to poor preservation or lack of data, the character was scored with a “?”. If a character was inapplicable in a taxon, e.g. pertaining to a structure which was absent in that particular taxon, the character was scored with a “-“.We used the software Xper244 to assemble our matrix.

Bayesian inference was performed with MrBayes44 using MCMC with default parameters for morphological data8,46. MrBayes uses a modified version of the Juke-Cantor model for morphological data for binary and multi-state characters45,46. Only variable characters were sampled, omitting characters that have the same state for all examined taxa, and we compensated for character selection bias46. We assumed that all character states have equal frequency, that prior probabilities were equal for all trees, and that evolutionary rates vary between sites according to a discrete gamma distribution. Branch lengths were estimated using compound Dirichlet priors, which first specify a tree length using a gamma distribution and subsequently uses a Dirichlet distribution to partition the tree length into branch lengths47. Additional analyses indicate the tree topology was unaffected by the number of chains (four vs. eight) and changes in temperature of the MCMC analysis. It has been shown that likelihood-based methods such as Bayesian statistics outperform parsimony under these circumstances and produce more reliable tree hypotheses46. Trees (and other parameters of the model) are sampled from the corresponding marginal PP distributions. We summarized the posterior distribution of trees by computing a majority rule consensus tree. Average standard deviations of split frequencies stabilized at about 0.007–0.01 after 3 million generations, sampled every 1000 generations. The first 25% of the trees were discarded as burnin. The consensus trees were examined with the software FigTree v. 1.4.2 by Rambaut (http://tree.bio.ed.ac.uk/software/figtree/). As is common standard in statistics, we regard Bayesian credibility intervals of 95–99% as strong support, and at least 90% probability as good support (Supplementary Fig. 5).

#### Tip-dated phylogenetic analysis and rates of discrete character evolution

A tip-dating analysis using the sampled-ancestor implementation of the fossilized birth death process (SA-FBD)48,49,50,51 was conducted to obtain a posterior distribution of time-calibrated phylogenies used for downstream analysis of macroevolutionary inferences. The analysis used the same character matrix as the undated phylogenetic analyses described above.

As described for the undated analysis, we used the Mk model of character evolution52. Rate variation among characters was modeled using a lognormal distribution53. Rate variation among lineages was modeled using an independent gamma rates (IGR) clock model with a broad uniform prior spanning multiple orders of magnitude. We used a time-varying implementation of the SA-FBD model where macroevolutionary and sampling parameters are estimated for each geological stage, resulting in a time-series of speciation (λ), extinction (μ), and fossil sampling and recovery (ψ) rates54,55,56. A wide, uniform prior distribution spanning the middle Cambrian to Ordovician was placed on the age of the most recent common ancestor of the clade (i.e., the tree age). Additional details concerning the choice of prior distributions used are discussed in the Supplementary Methods. Markov-chain Monte Carlo analysis was performed in MrBayes for 10 million generations with the first 25% of samples discarded as burn-in. Estimated sample sizes for all parameters were >300, with most >1000, and the potential scale reduction factor (PSRF) was ~1.00 for all parameters. The total number of post-burnin time-calibrated trees used in downstream analyses was 1500.

The relaxed clock model of morphological evolution employed in the tip-dating analysis was simultaneously used to estimate the mean per-branch rate of discrete character evolution across the phylogeny (see Wright55 for a similar analysis in fossil crinoids and featuring a detailed description of methods). A time-calibrated maximum clade credibility tree with the distribution of mean rates among branches is shown in Fig. 3.

#### Completeness of the ophiuroid fossil record and sampling ancestor-descendant relationships

The completeness of the fossil record57 can be quantified as the proportion of extinct taxa with a fossil record that has been sampled at least once. Because our study involves a possible case documenting evolutionary change spanning a speciation event (or sequence of speciation events) in the fossil record, we explored the completeness of the ophiuroid fossil record over time and tested whether sampling ancestor-descendant pairs should be expected from that record, particularly during the Silurian. Formally, completeness can be calculated as:

$$varPhi =psi /(mu +psi )$$

(1)

where ψ and μ are the fossil sampling and extinction rates, respectively (see Eqs. 1a,1b in the appendix of Foote58. Using the posterior sample of parameter estimates from the tip-dated analysis, we calculated Φ for each geological stage using parameters estimated from the SA-FBD model. Completeness metrics for the Ordovician—Permian stages are depicted in Supplementary Fig. 6.

The recovery of sampled ancestors implies either direct anagenetic transformation or “budding” cladogenesis (particularly if species temporal ranges overlap57,59). The probability of sampling an ancestor-descendant pair (AD) can be calculated as Pr[AD] = Φ2,58.

To evaluate empirical support for whether Ophiopetagno paicei and/or Muldaster haakei represent sampled ancestors (i.e., fossil morphotaxa that are either directly or indirectly ancestors to another fossil morphotaxon57), we calculated the PP of ancestor-descendant relationships by dividing the frequency in which a morphotaxon was placed as an ancestor in the posterior distribution of time-calibrated trees by the total number of trees sampled (N = 1500 phylogenies).

#### Sensitivity analyses of phylogenetic and macroevolutionary inferences

To evaluate whether our phylogenetic and macroevolutionary inferences are sensitive to variation in assumptions and choice of prior distributions, we conducted a series of six additional tip-dated analyses that vary in choice of one or more of the following: FBD priors, constant vs. time heterogeneous rates of sampling and diversification, and clock model priors (Supplementary Methods, Supplementary Table 2). In particular, we explored whether our results are robust to alternative priors and assumptions regarding fossil sampling. Careful attention to fossil sampling is necessary for several reasons. For example, parameter identifiability in birth-death-sampling models may be a concern for methods simultaneously estimating origination, extinction, and sampling rates, but identifiability can be improved when parameter constraints are available60. In addition, we could not plausibly include all known Phanerozoic brittle star taxa in our phylogenetic analysis, but highly incomplete taxon sampling may bias FBD parameter estimation, which in turn may influence macroevolutionary inferences, including ancestor-descendant probabilities. To address these concerns, we conducted several analyses that constrained the fossil sampling rate to reflect values estimated using traditional palaeontological methods (i.e., independent of phylogeny). To obtain an empirically informed prior for the sampling rate, we downloaded all Phanerozoic occurrences of ophiuroid taxa from the Paleobiology Database (paleobiodb.org) and estimated per-interval sampling probabilities using three-timer methods61,62. These per-interval sampling probabilities were then converted to rates, and the mean and variance of the Phanerozoic sampling rate distribution were used to specify an empirically informed Beta distribution as a prior on the FBD fossil sampling rate (Supplemental Methods). Furthermore, we also conducted analyses varying clock model priors, such as calibrating the variance parameter of the IGR model and placing empirically informed estimates on the prior specifying the base rate of the clock63. All sensitivity analyses recovered identical phylogenetic positions and macroevolutionary inferences for key taxa described here as those recovered by the initial analysis, indicating our results are robust to a wide range of variation in choice of assumptions and prior distributions (Supplementary Methods).

### Plate size analysis

In order to approximate the body size trends of the Gotland ophiuroids (Fig. 1), we measured the average lateral surface area of the lateral arm plates. We assume that the area of the lateral arm plates is correlated to the size of the arm but not necessarily the diameter of the disc nor the length of the arm. Length and width of the lateral arm plates were measured graphically using the software imageJ. Plate counts, average values and standard deviation for every sampled locality are given in Supplementary Table 1 and Supplementary Data 2. Because Hoburgen yielded only a single lateral arm plate (of very large size compared to the others), we excluded this sample from our analysis.

To assess how robust the results of our time-series analysis are to sampling artifacts and other issues concerning the fidelity of the fossil record, we implemented a significance test using Monte Carlo simulations. We simulated arm plate size data for each sample locality by drawing values from normal distributions having the mean and variance of the empirical data, with the number of fossil occurrences per site equal to the sample size of that site. Note the assumption of normally distributed variation is standard for quantitative characters such as size, which are often influenced by many loci with small effects. However, because the spatial and temporal distribution of fossils can be biased by a number of non-biological factors (e.g., taphonomic processes), we simulated 10,000 hypothetical fossil records and permuted fossil occurrences across times and across localities to investigate a worst-case scenario where patterns of body size change have virtually no meaningful evolutionary signal. For each simulated record, we then examined the patterns of change across the three palaeoenvironmental crisis events (Ireviken, Mulde and Lau Events) to generate null distributions of net change across each event. For each crisis interval i, we calculated the change in size (Δi) as the difference between mean arm plate area of the oldest and youngest assemblages associated with that event, and compared empirical values of Δi to the null distribution of size changes to calculate p-values associated the observed patterns (Supplementary Fig. 7).

### Statistical methods and reproducibility

Phylogenetic analyses were conducted in with MrBayes44. All Bayesian PP values of 95–99% were interpreted as indicated strong support, ≥90% PP as good support, and ≥50% as positive support. For the Monte Carlo test, p-values <0.05 were considered significant.