rST parser: restore backwards compatibility of nested parsing.

* Keep document-wide title style hierarchy in nested parsing. * Revert to using `document.memo.section_level` (required for nested parsing into a detached base node with document-wide title styles). + simpler logic + backwards compatible - more bookkeeping effort - mandating a document-wide section title style hierarchy is ill-suited for inclusion of rST blocks from external sources (e.g. extracted docstrings). See `sphinx.util.parsing.nested_parse_to_nodes()` for an alternative. This should restore compatibility with Sphinx's "only" directive broken by [r10204]. git-svn-id: http://svn.code.sf.net/p/docutils/code/trunk@10226 929543f6-e4f2-0310-98a6-ba3bd3dd1d04
2025-10-06 00:32:41 +02:00 · 2025-09-05 09:10:58 +00:00
parent f00f836e28
commit 570615a844
4 changed files with 206 additions and 176 deletions
--- a/docutils/HISTORY.rst
+++ b/docutils/HISTORY.rst
@@ -29,17 +29,14 @@ Release 0.23b0 (unpublished)
 * docutils/parsers/rst/states.py

  - Relax "section title" system messages from SEVERE to ERROR.
-  - Ensure new "current node" is valid when switching section level
-    (cf. bugs #508 and #509).
-  - Use a `separate title style hierarchy for nested parsing`__.
+  - Revert to using `document.memo.section_level` to fix behaviour with
+    nested parsing into a detached node (cf. bugs #508 and #509).
  - Set `parent_state_machine` attribute when creating nested state machines.
    Allows passing an updated "current node" to the parent state machine,
    e.g. for changing the section level in a directive.
  - Better error messages for grid table markup errors (bug #504),
    based on patch #214 by Jynn Nelson.

-  __ RELEASE-NOTES.html#nested-parsing
-
 * docutils/statemachine.py

  - New attribute `StateMachine.parent_state_machine` to store the
--- a/docutils/RELEASE-NOTES.rst
+++ b/docutils/RELEASE-NOTES.rst
@@ -210,9 +210,8 @@ Removals

 * Remove `states.RSTStateMachine.memo.reporter`,
  `states.RSTStateMachine.memo.section_bubble_up_kludge`,
-  `states.RSTStateMachine.memo.section_level`,
  `states.RSTState.title_inconsistent()`, and `states.Line.eofcheck`
-  in Docutils 2.0. Ignored since Docutils 0.22.1.
+  in Docutils 2.0. Ignored since Docutils 0.22.

 * Remove `parsers.rst.states.Struct` (obsoleted by `types.SimpleNamespace`)
  in Docutils 2.0.
@@ -264,30 +263,8 @@ Misc
 Release 0.23b0 (unpublished)
 ============================

-reStructuredText parser:
-  _`Nested parsing` uses a separate section `title style hierarchy`_ if
-  `states.RSTState.nested_parsing()` is used with ``match_titles=True``.
-  Content included via nested parsing may use section title styles in
-  different order, all sections become sub-sections (or sub-sub-section...)
-  of the current section level. [#]_
-  This ensures that all elements generated by the nested parsing are
-  added to the provided base node (without possible data loss as in
-  Docutils < 0.22).
-
-  No changes are required to document sources that work fine
-  in Docutils <= 0.22.
-
-  .. [#] similar to Sphinx's `sphinx.util.node.nested_parse_with_titles()`
-     and overriding the ``keep_title_context`` argument of
-     `sphinx.util.parsing.nested_parse_to_nodes()`__
-
-     __ https://www.sphinx-doc.org/en/master/extdev/utils.html
-        #sphinx.util.parsing.nested_parse_to_nodes
-
 Bugfixes and improvements (see HISTORY_).

-.. _title style hierarchy: docs/ref/rst/restructuredtext.html#title-styles
-

 Release 0.22 (2025-07-29)
 =========================
--- a/docutils/docutils/parsers/rst/states.py
+++ b/docutils/docutils/parsers/rst/states.py
@@ -104,7 +104,6 @@ from __future__ import annotations

 __docformat__ = 'reStructuredText'

-import copy
 import re
 from types import FunctionType, MethodType
 from types import SimpleNamespace as Struct
@@ -158,13 +157,13 @@ class RSTStateMachine(StateMachineWS):
            inliner = Inliner()
        inliner.init_customizations(document.settings)
        # A collection of objects to share with nested parsers.
-        # The attributes `reporter`, `section_level`, and
-        # `section_bubble_up_kludge` will be removed in Docutils 2.0
+        # The attributes `reporter` and `section_bubble_up_kludge`
+        # will be removed in Docutils 2.0
        self.memo = Struct(document=document,
                           reporter=document.reporter,  # ignored
                           language=self.language,
                           title_styles=[],
-                           section_level=0,  # ignored
+                           section_level=0,  # (0 document, 1 section, ...)
                           section_bubble_up_kludge=False,  # ignored
                           inliner=inliner)
        self.document = document
@@ -187,23 +186,15 @@ class NestedStateMachine(StateMachineWS):
        """
        Parse `input_lines` and populate `node`.

-        Use a separate "title style hierarchy" (changed in Docutils 0.23).
-
        Extend `StateMachineWS.run()`: set up document-wide data.
        """
        self.match_titles = match_titles
-        self.memo = copy.copy(memo)
+        self.memo = memo
        self.document = memo.document
        self.attach_observer(self.document.note_source)
        self.language = memo.language
        self.reporter = self.document.reporter
        self.node = node
-        if match_titles:
-            # Use a separate section title style hierarchy;
-            # ensure all sections in the `input_lines` are treated as
-            # subsections of the current section by blocking lower
-            # section levels with a style that is impossible in rST:
-            self.memo.title_styles = ['x'] * len(node.section_hierarchy())
        results = StateMachineWS.run(self, input_lines, input_offset)
        assert results == [], ('NestedStateMachine.run() results should be '
                               'empty!')
@@ -287,13 +278,10 @@ class RSTState(StateWS):
        :input_offset:
            Line number at start of the block.
        :node:
-            Base node. All generated nodes will be appended to this node.
+            Base node. Generated nodes will be appended to this node.
        :match_titles:
            Allow section titles?
-            A separate section title style hierarchy is used for the nested
-            parsing (all sections are subsections of the current section).
-            The calling code should check whether sections are valid
-            children of the base node and move them or warn otherwise.
+            Caution: May lead to an invalid or mixed up document tree. [#]_
        :state_machine_class:
            Default: `NestedStateMachine`.
        :state_machine_kwargs:
@@ -302,6 +290,12 @@ class RSTState(StateWS):

        Create a new state-machine instance if required.
        Return new offset.
+
+        .. [#] See also ``test_parsers/test_rst/test_nested_parsing.py``
+               and Sphinx's `nested_parse_to_nodes()`__.
+
+        __ https://www.sphinx-doc.org/en/master/extdev/utils.html
+           #sphinx.util.parsing.nested_parse_to_nodes
        """
        use_default = 0
        if state_machine_class is None:
@@ -396,9 +390,8 @@ class RSTState(StateWS):
        (or the root node if the new section is a top-level section).
        """
        title_styles = self.memo.title_styles
-        parent_sections = self.parent.section_hierarchy()
        # current section level: (0 root, 1 section, 2 subsection, ...)
-        oldlevel = len(parent_sections)
+        oldlevel = self.memo.section_level
        # new section level:
        try:  # check for existing title style
            newlevel = title_styles.index(style) + 1
@@ -415,13 +408,33 @@ class RSTState(StateWS):
                nodes.paragraph('', f'Established title styles: {styles}'),
                line=lineno)
            return False
-        # Update parent state:
+        if newlevel <= oldlevel:
+            # new section is sibling or higher up in the section hierarchy
+            parent_sections = self.parent.section_hierarchy()
+            try:
+                new_parent = parent_sections[newlevel-oldlevel-1].parent
+            except IndexError:
+                new_parent = None
+            if new_parent is None:
+                styles = ' '.join('/'.join(style) for style in title_styles)
+                details = (f'The parent of level {newlevel} sections cannot'
+                           ' be reached.\nOne reason may be a high level'
+                           ' section used in a directive that parses its'
+                           ' content into a base node not attached to'
+                           ' the document\n(up to Docutils 0.21,'
+                           ' these sections were silently dropped).')
+                self.parent += self.reporter.error(
+                    f'A level {newlevel} section cannot be used here.',
+                    nodes.literal_block('', source),
+                    nodes.paragraph('', f'Established title styles: {styles}'),
+                    nodes.paragraph('', details),
+                    line=lineno)
+                return False
+            self.parent = new_parent
+        # Update memo:
        if newlevel > len(title_styles):
            title_styles.append(style)
        self.memo.section_level = newlevel
-        if newlevel <= oldlevel:
-            # new section is sibling or higher up in the section hierarchy
-            self.parent = parent_sections[newlevel-1].parent
        return True

    def title_inconsistent(self, sourcetext, lineno):
--- a/docutils/test/test_parsers/test_rst/test_nested_parsing.py
+++ b/docutils/test/test_parsers/test_rst/test_nested_parsing.py
@@ -7,16 +7,20 @@
 Tests for nested parsing with support for sections (cf. states.py).

 The method states.RSTState.nested_parse() provides the argument `match_titles`.
-However, in Docutils, it is only used with `match_titles=False`.
-None of the standard Docutils directives supports section titles in the
-directive content.  (Directives supporting sections in the content are,
-e.g., defined by the "autodoc" and "kerneldoc" Sphinx extensions.)
+With ``match_titles=True``, sections are supported, the section level is
+determined by the document-wide hierarchy of title styles. [1]_

-Up to Docutils 0.22, the section title styles were document-wide enforced and
-sections with current level or higher were silently dropped!
+In Docutils, `nested_parse()` is only used with ``match_titles=False``.
+None of the standard Docutils directives support section titles in the
+directive content.   Up to Docutils 0.22, sections with current level or
+higher were silently dropped!

-Sphinx uses the `sphinx.util.parsing._fresh_title_style_context` context
-manager to provide a separate title style hierarchy for nested parsing.
+Directives supporting sections in the content are defined
+by Sphinx extensions, e.g., "autodoc" and "kerneldoc".
+
+.. [1] Sphinx uses the `sphinx.util.parsing._fresh_title_style_context`
+       context manager to provide a separate title style hierarchy for
+       nested parsing.
 """

 from pathlib import Path
@@ -42,9 +46,9 @@ class ParseIntoNode(rst.Directive):
    has_content = True

    def run(self):
-        # similar to sphinx.util.parsing.nested_parse_to_nodes()
+        # cf. sphinx.util.parsing.nested_parse_to_nodes()
        node = nodes.Element()
-        node.document = self.state.document  # not required
+        node.document = self.state.document
        # support sections (unless we know it is invalid):
        match_titles = isinstance(self.state_machine.node,
                                  (nodes.document, nodes.section))
@@ -58,7 +62,7 @@ class ParseIntoNode(rst.Directive):
                self.state_machine.node = self.state_machine.node[-1]
        except IndexError:
            pass
-        # pass on the new "current node" to parent state machines
+        # Pass current node to parent state machines:
        sm = self.state_machine
        try:
            while True:
@@ -70,30 +74,25 @@ class ParseIntoNode(rst.Directive):


 class ParseIntoCurrentNode(ParseIntoNode):
+    # Attention: this directive is flawed:
+    # * no check for section validity,
+    # * "current" node not updated! -> element order may get lost.
    def run(self):
        node = self.state_machine.node  # the current "insertion point"
-        # support sections (unless we know it is invalid):
-        match_titles = isinstance(node, (nodes.document, nodes.section))
-        self.state.nested_parse(self.content, 0, node, match_titles)
+        self.state.nested_parse(self.content, 0, node, match_titles=True)
        return []  # node already attached to document


 class ParseIntoSectionNode(ParseIntoNode):
+    # Some 3rd party extensions use a <section> as dummy base node.
+    #
+    # Attention: this directive is flawed:
+    # * no check for section validity,
+    # * "current" node not updated! -> element order may get lost.
    def run(self):
-        if not isinstance(self.state_machine.node,
-                          (nodes.document, nodes.section)):
-            msg = self.reporter.error(
-                    'The "nested-section" directive can only be used'
-                    ' where a section is valid.',
-                    nodes.literal_block(self.block_text, self.block_text),
-                    line=self.lineno)
-            return [msg]
-        node = nodes.section('')
-        node.append(nodes.title('', 'generated section'))
-        # In production, also generate and register section name and ID
-        # (cf. rst.states.RSTState.new_subsection()).
+        node = nodes.section()
        self.state.nested_parse(self.content, 0, node, match_titles=True)
-        return [node]
+        return node.children


 class ParserTestCase(unittest.TestCase):
@@ -124,35 +123,34 @@ class ParserTestCase(unittest.TestCase):
 totest = {}

 totest['nested_parsing'] = [
-# Start new section hierarchy with every nested parse.
+# The document-wide section hierarchy is employed also in nested parsing.
 ["""\
 sec1
 ====
 sec1.1
 ------
-
 .. nested::

-  nested1
-  *******
-  nested1.1
-  =========
+  nested1.1.1
+  ***********
+  nested1.1.1.1
+  ~~~~~~~~~~~~~

 sec2
 ====
-The document-wide section title styles are kept.
-
 .. nested::

-  nested2
-  =======
+  skipping2.1
+  ***********
  nested2.1
-  *********
+  ---------
+  inaccessible2
+  =============

 sec2.2
 ------
-sec2.2.1
-~~~~~~~~
+skipping2.2.1
+~~~~~~~~~~~~~
 """,
 """\
 <document source="test data">
@@ -162,32 +160,52 @@ sec2.2.1
        <section ids="sec1-1" names="sec1.1">
            <title>
                sec1.1
-            <section ids="nested1" names="nested1">
+            <section ids="nested1-1-1" names="nested1.1.1">
                <title>
-                    nested1
-                <section ids="nested1-1" names="nested1.1">
+                    nested1.1.1
+                <section ids="nested1-1-1-1" names="nested1.1.1.1">
                    <title>
-                        nested1.1
+                        nested1.1.1.1
    <section ids="sec2" names="sec2">
        <title>
            sec2
-        <paragraph>
-            The document-wide section title styles are kept.
-        <section ids="nested2" names="nested2">
+        <system_message level="3" line="1" source="test data" type="ERROR">
+            <paragraph>
+                Inconsistent title style: skip from level 1 to 3.
+            <literal_block xml:space="preserve">
+                skipping2.1
+                ***********
+            <paragraph>
+                Established title styles: = - * ~
+        <section ids="nested2-1" names="nested2.1">
            <title>
-                nested2
-            <section ids="nested2-1" names="nested2.1">
-                <title>
-                    nested2.1
+                nested2.1
+            <system_message level="3" line="5" source="test data" type="ERROR">
+                <paragraph>
+                    A level 1 section cannot be used here.
+                <literal_block xml:space="preserve">
+                    inaccessible2
+                    =============
+                <paragraph>
+                    Established title styles: = - * ~
+                <paragraph>
+                    The parent of level 1 sections cannot be reached.
+                    One reason may be a high level section used in a directive that parses its content into a base node not attached to the document
+                    (up to Docutils 0.21, these sections were silently dropped).
        <section ids="sec2-2" names="sec2.2">
            <title>
                sec2.2
-            <section ids="sec2-2-1" names="sec2.2.1">
-                <title>
-                    sec2.2.1
+            <system_message level="3" line="25" source="test data" type="ERROR">
+                <paragraph>
+                    Inconsistent title style: skip from level 2 to 4.
+                <literal_block xml:space="preserve">
+                    skipping2.2.1
+                    ~~~~~~~~~~~~~
+                <paragraph>
+                    Established title styles: = - * ~
 """],
-# Move "insertion point" if the nested block contains sections to
-# comply with the validity constraints of the "structure model".
+# The `ParseIntoNode` directive updates the "current node" to comply with
+# the validity constraints of the "structure model".
 ["""\
 .. nested::

@@ -210,8 +228,7 @@ This paragraph belongs to the last nested section.
                This paragraph belongs to the last nested section.
 """],
 ["""\
-.. note:: A preceding directive must not foil the "insertion point move".
-
+.. note:: The next directive is parsed with "nested_list_parse()".
 .. nested::

  nested1
@@ -225,7 +242,7 @@ This paragraph belongs to the last nested section.
 <document source="test data">
    <note>
        <paragraph>
-            A preceding directive must not foil the "insertion point move".
+            The next directive is parsed with "nested_list_parse()".
    <section ids="nested1" names="nested1">
        <title>
            nested1
@@ -251,23 +268,27 @@ This paragraph belongs to the document.
    <paragraph>
        This paragraph belongs to the document.
 """],
-# base node == current node
+# If the base node is the "current node", it is possible to have lower
+# level sections inside the nested content block.
+# The generated nodes are added to the respective parent sections
+# and not necessarily children of the base node.
 ["""\
 sec1
 ====
 sec1.1
 ------
+.. note:: The next directive is parsed with "nested_list_parse()".
 .. nested-current::

-  current1
-  ********
-  current1.1
-  -----------
-  current1.1.1
-  ============
+  nc1.1.1
+  *******
+  nc1.2
+  -----
+  nc2
+  ===

-sec1.1.2
-~~~~~~~~
+sec2.2
+------
 """,
 """\
 <document source="test data">
@@ -277,20 +298,23 @@ sec1.1.2
        <section ids="sec1-1" names="sec1.1">
            <title>
                sec1.1
-            <section ids="current1" names="current1">
+            <note>
+                <paragraph>
+                    The next directive is parsed with "nested_list_parse()".
+            <section ids="nc1-1-1" names="nc1.1.1">
                <title>
-                    current1
-                <section ids="current1-1" names="current1.1">
-                    <title>
-                        current1.1
-                    <section ids="current1-1-1" names="current1.1.1">
-                        <title>
-                            current1.1.1
-            <section ids="sec1-1-2" names="sec1.1.2">
+                    nc1.1.1
+            <section ids="sec2-2" names="sec2.2">
                <title>
-                    sec1.1.2
+                    sec2.2
+        <section ids="nc1-2" names="nc1.2">
+            <title>
+                nc1.2
+    <section ids="nc2" names="nc2">
+        <title>
+            nc2
 """],
-# parse into generated <section> node:
+# Flawed directive (no update of "current node"):
 ["""\
 sec1
 ====
@@ -298,16 +322,10 @@ sec1.1
 ------
 .. nested-section::

-  nested-section1
-  ***************
-  nested-section1.1
-  =================
-
-This paragraph belongs to the last nested section.
-
-sec1.1.2
-~~~~~~~~
+  nested-section1.1.1
+  *******************

+This paragraph belongs to the last nested section (sic!).
 """,
 """\
 <document source="test data">
@@ -317,67 +335,92 @@ sec1.1.2
        <section ids="sec1-1" names="sec1.1">
            <title>
                sec1.1
-            <section>
+            <section ids="nested-section1-1-1" names="nested-section1.1.1">
                <title>
-                    generated section
-                <section ids="nested-section1" names="nested-section1">
-                    <title>
-                        nested-section1
-                    <section ids="nested-section1-1" names="nested-section1.1">
-                        <title>
-                            nested-section1.1
+                    nested-section1.1.1
            <paragraph>
-                This paragraph belongs to the last nested section.
-            <section ids="sec1-1-2" names="sec1.1.2">
-                <title>
-                    sec1.1.2
-    <system_message level="2" line="12" source="test data" type="WARNING">
+                This paragraph belongs to the last nested section (sic!).
+    <system_message level="2" line="10" source="test data" type="WARNING">
        <paragraph>
            Element <section ids="sec1-1" names="sec1.1"> invalid:
              Child element <paragraph> not allowed at this position.
 """],
+# Even if the base node is a <section>, it does not show up in
+# `node.parent_sections()` because it does not have a parent
+# -> we cannot add a sibling section:
+["""\
+sec1
+====
+.. nested-section::
+
+  nested-section1
+  ===============
+  with content
+""",
+"""\
+<document source="test data">
+    <section ids="sec1" names="sec1">
+        <title>
+            sec1
+        <system_message level="3" line="1" source="test data" type="ERROR">
+            <paragraph>
+                A level 1 section cannot be used here.
+            <literal_block xml:space="preserve">
+                nested-section1
+                ===============
+            <paragraph>
+                Established title styles: =
+            <paragraph>
+                The parent of level 1 sections cannot be reached.
+                One reason may be a high level section used in a directive that parses its content into a base node not attached to the document
+                (up to Docutils 0.21, these sections were silently dropped).
+        <paragraph>
+            with content
+"""],
 # Nested parsing in a block-quote:
 ["""\
-  .. nested-current::
-
-    Nested parsing is OK but a section is invalid in a block-quote.
-
-    nested section
-    ==============
-
  .. nested::

+    A section in a block-quote is invalid.
+
    invalid section
    ---------------

+  .. nested-current::
+
+    invalid, too (sic!)
+    ===================
+
  .. nested-section::

-    The <section> base node is invalid in a block-quote.
+    The <section> base node is discarded.
+
+    invalid section (sic!)
+    ----------------------
 """,
 """\
 <document source="test data">
    <block_quote>
        <paragraph>
-            Nested parsing is OK but a section is invalid in a block-quote.
+            A section in a block-quote is invalid.
        <system_message level="3" line="6" source="test data" type="ERROR">
-            <paragraph>
-                Unexpected section title.
-            <literal_block xml:space="preserve">
-                nested section
-                ==============
-        <system_message level="3" line="11" source="test data" type="ERROR">
            <paragraph>
                Unexpected section title.
            <literal_block xml:space="preserve">
                invalid section
                ---------------
-        <system_message level="3" line="13" source="test data" type="ERROR">
-            <paragraph>
-                The "nested-section" directive can only be used where a section is valid.
-            <literal_block xml:space="preserve">
-                .. nested-section::
-                \n\
-                  The <section> base node is invalid in a block-quote.
+        <section ids="invalid-too-sic" names="invalid,\\ too\\ (sic!)">
+            <title>
+                invalid, too (sic!)
+        <paragraph>
+            The <section> base node is discarded.
+        <section ids="invalid-section-sic" names="invalid\\ section\\ (sic!)">
+            <title>
+                invalid section (sic!)
+    <system_message level="2" line="1" source="test data" type="WARNING">
+        <paragraph>
+            Element <block_quote> invalid:
+              Child element <section ids="invalid-too-sic" names="invalid,\\ too\\ (sic!)"> not allowed at this position.
 """],
 ]