rST parser: restore backwards compatibility of nested parsing.

* Keep document-wide title style hierarchy in nested parsing.

* Revert to using `document.memo.section_level` (required for nested parsing
  into a detached base node with document-wide title styles).

+ simpler logic
+ backwards compatible
- more bookkeeping effort
- mandating a document-wide section title style hierarchy is ill-suited for
  inclusion of rST blocks from external sources (e.g. extracted docstrings).
  See `sphinx.util.parsing.nested_parse_to_nodes()` for an alternative.

This should restore compatibility with Sphinx's "only" directive broken by [r10204].

git-svn-id: http://svn.code.sf.net/p/docutils/code/trunk@10226 929543f6-e4f2-0310-98a6-ba3bd3dd1d04
This commit is contained in:
milde
2025-09-05 09:10:58 +00:00
parent f00f836e28
commit 570615a844
4 changed files with 206 additions and 176 deletions

View File

@@ -29,17 +29,14 @@ Release 0.23b0 (unpublished)
* docutils/parsers/rst/states.py
- Relax "section title" system messages from SEVERE to ERROR.
- Ensure new "current node" is valid when switching section level
(cf. bugs #508 and #509).
- Use a `separate title style hierarchy for nested parsing`__.
- Revert to using `document.memo.section_level` to fix behaviour with
nested parsing into a detached node (cf. bugs #508 and #509).
- Set `parent_state_machine` attribute when creating nested state machines.
Allows passing an updated "current node" to the parent state machine,
e.g. for changing the section level in a directive.
- Better error messages for grid table markup errors (bug #504),
based on patch #214 by Jynn Nelson.
__ RELEASE-NOTES.html#nested-parsing
* docutils/statemachine.py
- New attribute `StateMachine.parent_state_machine` to store the

View File

@@ -210,9 +210,8 @@ Removals
* Remove `states.RSTStateMachine.memo.reporter`,
`states.RSTStateMachine.memo.section_bubble_up_kludge`,
`states.RSTStateMachine.memo.section_level`,
`states.RSTState.title_inconsistent()`, and `states.Line.eofcheck`
in Docutils 2.0. Ignored since Docutils 0.22.1.
in Docutils 2.0. Ignored since Docutils 0.22.
* Remove `parsers.rst.states.Struct` (obsoleted by `types.SimpleNamespace`)
in Docutils 2.0.
@@ -264,30 +263,8 @@ Misc
Release 0.23b0 (unpublished)
============================
reStructuredText parser:
_`Nested parsing` uses a separate section `title style hierarchy`_ if
`states.RSTState.nested_parsing()` is used with ``match_titles=True``.
Content included via nested parsing may use section title styles in
different order, all sections become sub-sections (or sub-sub-section...)
of the current section level. [#]_
This ensures that all elements generated by the nested parsing are
added to the provided base node (without possible data loss as in
Docutils < 0.22).
No changes are required to document sources that work fine
in Docutils <= 0.22.
.. [#] similar to Sphinx's `sphinx.util.node.nested_parse_with_titles()`
and overriding the ``keep_title_context`` argument of
`sphinx.util.parsing.nested_parse_to_nodes()`__
__ https://www.sphinx-doc.org/en/master/extdev/utils.html
#sphinx.util.parsing.nested_parse_to_nodes
Bugfixes and improvements (see HISTORY_).
.. _title style hierarchy: docs/ref/rst/restructuredtext.html#title-styles
Release 0.22 (2025-07-29)
=========================

View File

@@ -104,7 +104,6 @@ from __future__ import annotations
__docformat__ = 'reStructuredText'
import copy
import re
from types import FunctionType, MethodType
from types import SimpleNamespace as Struct
@@ -158,13 +157,13 @@ class RSTStateMachine(StateMachineWS):
inliner = Inliner()
inliner.init_customizations(document.settings)
# A collection of objects to share with nested parsers.
# The attributes `reporter`, `section_level`, and
# `section_bubble_up_kludge` will be removed in Docutils 2.0
# The attributes `reporter` and `section_bubble_up_kludge`
# will be removed in Docutils 2.0
self.memo = Struct(document=document,
reporter=document.reporter, # ignored
language=self.language,
title_styles=[],
section_level=0, # ignored
section_level=0, # (0 document, 1 section, ...)
section_bubble_up_kludge=False, # ignored
inliner=inliner)
self.document = document
@@ -187,23 +186,15 @@ class NestedStateMachine(StateMachineWS):
"""
Parse `input_lines` and populate `node`.
Use a separate "title style hierarchy" (changed in Docutils 0.23).
Extend `StateMachineWS.run()`: set up document-wide data.
"""
self.match_titles = match_titles
self.memo = copy.copy(memo)
self.memo = memo
self.document = memo.document
self.attach_observer(self.document.note_source)
self.language = memo.language
self.reporter = self.document.reporter
self.node = node
if match_titles:
# Use a separate section title style hierarchy;
# ensure all sections in the `input_lines` are treated as
# subsections of the current section by blocking lower
# section levels with a style that is impossible in rST:
self.memo.title_styles = ['x'] * len(node.section_hierarchy())
results = StateMachineWS.run(self, input_lines, input_offset)
assert results == [], ('NestedStateMachine.run() results should be '
'empty!')
@@ -287,13 +278,10 @@ class RSTState(StateWS):
:input_offset:
Line number at start of the block.
:node:
Base node. All generated nodes will be appended to this node.
Base node. Generated nodes will be appended to this node.
:match_titles:
Allow section titles?
A separate section title style hierarchy is used for the nested
parsing (all sections are subsections of the current section).
The calling code should check whether sections are valid
children of the base node and move them or warn otherwise.
Caution: May lead to an invalid or mixed up document tree. [#]_
:state_machine_class:
Default: `NestedStateMachine`.
:state_machine_kwargs:
@@ -302,6 +290,12 @@ class RSTState(StateWS):
Create a new state-machine instance if required.
Return new offset.
.. [#] See also ``test_parsers/test_rst/test_nested_parsing.py``
and Sphinx's `nested_parse_to_nodes()`__.
__ https://www.sphinx-doc.org/en/master/extdev/utils.html
#sphinx.util.parsing.nested_parse_to_nodes
"""
use_default = 0
if state_machine_class is None:
@@ -396,9 +390,8 @@ class RSTState(StateWS):
(or the root node if the new section is a top-level section).
"""
title_styles = self.memo.title_styles
parent_sections = self.parent.section_hierarchy()
# current section level: (0 root, 1 section, 2 subsection, ...)
oldlevel = len(parent_sections)
oldlevel = self.memo.section_level
# new section level:
try: # check for existing title style
newlevel = title_styles.index(style) + 1
@@ -415,13 +408,33 @@ class RSTState(StateWS):
nodes.paragraph('', f'Established title styles: {styles}'),
line=lineno)
return False
# Update parent state:
if newlevel <= oldlevel:
# new section is sibling or higher up in the section hierarchy
parent_sections = self.parent.section_hierarchy()
try:
new_parent = parent_sections[newlevel-oldlevel-1].parent
except IndexError:
new_parent = None
if new_parent is None:
styles = ' '.join('/'.join(style) for style in title_styles)
details = (f'The parent of level {newlevel} sections cannot'
' be reached.\nOne reason may be a high level'
' section used in a directive that parses its'
' content into a base node not attached to'
' the document\n(up to Docutils 0.21,'
' these sections were silently dropped).')
self.parent += self.reporter.error(
f'A level {newlevel} section cannot be used here.',
nodes.literal_block('', source),
nodes.paragraph('', f'Established title styles: {styles}'),
nodes.paragraph('', details),
line=lineno)
return False
self.parent = new_parent
# Update memo:
if newlevel > len(title_styles):
title_styles.append(style)
self.memo.section_level = newlevel
if newlevel <= oldlevel:
# new section is sibling or higher up in the section hierarchy
self.parent = parent_sections[newlevel-1].parent
return True
def title_inconsistent(self, sourcetext, lineno):

View File

@@ -7,16 +7,20 @@
Tests for nested parsing with support for sections (cf. states.py).
The method states.RSTState.nested_parse() provides the argument `match_titles`.
However, in Docutils, it is only used with `match_titles=False`.
None of the standard Docutils directives supports section titles in the
directive content. (Directives supporting sections in the content are,
e.g., defined by the "autodoc" and "kerneldoc" Sphinx extensions.)
With ``match_titles=True``, sections are supported, the section level is
determined by the document-wide hierarchy of title styles. [1]_
Up to Docutils 0.22, the section title styles were document-wide enforced and
sections with current level or higher were silently dropped!
In Docutils, `nested_parse()` is only used with ``match_titles=False``.
None of the standard Docutils directives support section titles in the
directive content. Up to Docutils 0.22, sections with current level or
higher were silently dropped!
Sphinx uses the `sphinx.util.parsing._fresh_title_style_context` context
manager to provide a separate title style hierarchy for nested parsing.
Directives supporting sections in the content are defined
by Sphinx extensions, e.g., "autodoc" and "kerneldoc".
.. [1] Sphinx uses the `sphinx.util.parsing._fresh_title_style_context`
context manager to provide a separate title style hierarchy for
nested parsing.
"""
from pathlib import Path
@@ -42,9 +46,9 @@ class ParseIntoNode(rst.Directive):
has_content = True
def run(self):
# similar to sphinx.util.parsing.nested_parse_to_nodes()
# cf. sphinx.util.parsing.nested_parse_to_nodes()
node = nodes.Element()
node.document = self.state.document # not required
node.document = self.state.document
# support sections (unless we know it is invalid):
match_titles = isinstance(self.state_machine.node,
(nodes.document, nodes.section))
@@ -58,7 +62,7 @@ class ParseIntoNode(rst.Directive):
self.state_machine.node = self.state_machine.node[-1]
except IndexError:
pass
# pass on the new "current node" to parent state machines
# Pass current node to parent state machines:
sm = self.state_machine
try:
while True:
@@ -70,30 +74,25 @@ class ParseIntoNode(rst.Directive):
class ParseIntoCurrentNode(ParseIntoNode):
# Attention: this directive is flawed:
# * no check for section validity,
# * "current" node not updated! -> element order may get lost.
def run(self):
node = self.state_machine.node # the current "insertion point"
# support sections (unless we know it is invalid):
match_titles = isinstance(node, (nodes.document, nodes.section))
self.state.nested_parse(self.content, 0, node, match_titles)
self.state.nested_parse(self.content, 0, node, match_titles=True)
return [] # node already attached to document
class ParseIntoSectionNode(ParseIntoNode):
# Some 3rd party extensions use a <section> as dummy base node.
#
# Attention: this directive is flawed:
# * no check for section validity,
# * "current" node not updated! -> element order may get lost.
def run(self):
if not isinstance(self.state_machine.node,
(nodes.document, nodes.section)):
msg = self.reporter.error(
'The "nested-section" directive can only be used'
' where a section is valid.',
nodes.literal_block(self.block_text, self.block_text),
line=self.lineno)
return [msg]
node = nodes.section('')
node.append(nodes.title('', 'generated section'))
# In production, also generate and register section name and ID
# (cf. rst.states.RSTState.new_subsection()).
node = nodes.section()
self.state.nested_parse(self.content, 0, node, match_titles=True)
return [node]
return node.children
class ParserTestCase(unittest.TestCase):
@@ -124,35 +123,34 @@ class ParserTestCase(unittest.TestCase):
totest = {}
totest['nested_parsing'] = [
# Start new section hierarchy with every nested parse.
# The document-wide section hierarchy is employed also in nested parsing.
["""\
sec1
====
sec1.1
------
.. nested::
nested1
*******
nested1.1
=========
nested1.1.1
***********
nested1.1.1.1
~~~~~~~~~~~~~
sec2
====
The document-wide section title styles are kept.
.. nested::
nested2
=======
skipping2.1
***********
nested2.1
*********
---------
inaccessible2
=============
sec2.2
------
sec2.2.1
~~~~~~~~
skipping2.2.1
~~~~~~~~~~~~~
""",
"""\
<document source="test data">
@@ -162,32 +160,52 @@ sec2.2.1
<section ids="sec1-1" names="sec1.1">
<title>
sec1.1
<section ids="nested1" names="nested1">
<section ids="nested1-1-1" names="nested1.1.1">
<title>
nested1
<section ids="nested1-1" names="nested1.1">
nested1.1.1
<section ids="nested1-1-1-1" names="nested1.1.1.1">
<title>
nested1.1
nested1.1.1.1
<section ids="sec2" names="sec2">
<title>
sec2
<paragraph>
The document-wide section title styles are kept.
<section ids="nested2" names="nested2">
<system_message level="3" line="1" source="test data" type="ERROR">
<paragraph>
Inconsistent title style: skip from level 1 to 3.
<literal_block xml:space="preserve">
skipping2.1
***********
<paragraph>
Established title styles: = - * ~
<section ids="nested2-1" names="nested2.1">
<title>
nested2
<section ids="nested2-1" names="nested2.1">
<title>
nested2.1
nested2.1
<system_message level="3" line="5" source="test data" type="ERROR">
<paragraph>
A level 1 section cannot be used here.
<literal_block xml:space="preserve">
inaccessible2
=============
<paragraph>
Established title styles: = - * ~
<paragraph>
The parent of level 1 sections cannot be reached.
One reason may be a high level section used in a directive that parses its content into a base node not attached to the document
(up to Docutils 0.21, these sections were silently dropped).
<section ids="sec2-2" names="sec2.2">
<title>
sec2.2
<section ids="sec2-2-1" names="sec2.2.1">
<title>
sec2.2.1
<system_message level="3" line="25" source="test data" type="ERROR">
<paragraph>
Inconsistent title style: skip from level 2 to 4.
<literal_block xml:space="preserve">
skipping2.2.1
~~~~~~~~~~~~~
<paragraph>
Established title styles: = - * ~
"""],
# Move "insertion point" if the nested block contains sections to
# comply with the validity constraints of the "structure model".
# The `ParseIntoNode` directive updates the "current node" to comply with
# the validity constraints of the "structure model".
["""\
.. nested::
@@ -210,8 +228,7 @@ This paragraph belongs to the last nested section.
This paragraph belongs to the last nested section.
"""],
["""\
.. note:: A preceding directive must not foil the "insertion point move".
.. note:: The next directive is parsed with "nested_list_parse()".
.. nested::
nested1
@@ -225,7 +242,7 @@ This paragraph belongs to the last nested section.
<document source="test data">
<note>
<paragraph>
A preceding directive must not foil the "insertion point move".
The next directive is parsed with "nested_list_parse()".
<section ids="nested1" names="nested1">
<title>
nested1
@@ -251,23 +268,27 @@ This paragraph belongs to the document.
<paragraph>
This paragraph belongs to the document.
"""],
# base node == current node
# If the base node is the "current node", it is possible to have lower
# level sections inside the nested content block.
# The generated nodes are added to the respective parent sections
# and not necessarily children of the base node.
["""\
sec1
====
sec1.1
------
.. note:: The next directive is parsed with "nested_list_parse()".
.. nested-current::
current1
********
current1.1
-----------
current1.1.1
============
nc1.1.1
*******
nc1.2
-----
nc2
===
sec1.1.2
~~~~~~~~
sec2.2
------
""",
"""\
<document source="test data">
@@ -277,20 +298,23 @@ sec1.1.2
<section ids="sec1-1" names="sec1.1">
<title>
sec1.1
<section ids="current1" names="current1">
<note>
<paragraph>
The next directive is parsed with "nested_list_parse()".
<section ids="nc1-1-1" names="nc1.1.1">
<title>
current1
<section ids="current1-1" names="current1.1">
<title>
current1.1
<section ids="current1-1-1" names="current1.1.1">
<title>
current1.1.1
<section ids="sec1-1-2" names="sec1.1.2">
nc1.1.1
<section ids="sec2-2" names="sec2.2">
<title>
sec1.1.2
sec2.2
<section ids="nc1-2" names="nc1.2">
<title>
nc1.2
<section ids="nc2" names="nc2">
<title>
nc2
"""],
# parse into generated <section> node:
# Flawed directive (no update of "current node"):
["""\
sec1
====
@@ -298,16 +322,10 @@ sec1.1
------
.. nested-section::
nested-section1
***************
nested-section1.1
=================
This paragraph belongs to the last nested section.
sec1.1.2
~~~~~~~~
nested-section1.1.1
*******************
This paragraph belongs to the last nested section (sic!).
""",
"""\
<document source="test data">
@@ -317,67 +335,92 @@ sec1.1.2
<section ids="sec1-1" names="sec1.1">
<title>
sec1.1
<section>
<section ids="nested-section1-1-1" names="nested-section1.1.1">
<title>
generated section
<section ids="nested-section1" names="nested-section1">
<title>
nested-section1
<section ids="nested-section1-1" names="nested-section1.1">
<title>
nested-section1.1
nested-section1.1.1
<paragraph>
This paragraph belongs to the last nested section.
<section ids="sec1-1-2" names="sec1.1.2">
<title>
sec1.1.2
<system_message level="2" line="12" source="test data" type="WARNING">
This paragraph belongs to the last nested section (sic!).
<system_message level="2" line="10" source="test data" type="WARNING">
<paragraph>
Element <section ids="sec1-1" names="sec1.1"> invalid:
Child element <paragraph> not allowed at this position.
"""],
# Even if the base node is a <section>, it does not show up in
# `node.parent_sections()` because it does not have a parent
# -> we cannot add a sibling section:
["""\
sec1
====
.. nested-section::
nested-section1
===============
with content
""",
"""\
<document source="test data">
<section ids="sec1" names="sec1">
<title>
sec1
<system_message level="3" line="1" source="test data" type="ERROR">
<paragraph>
A level 1 section cannot be used here.
<literal_block xml:space="preserve">
nested-section1
===============
<paragraph>
Established title styles: =
<paragraph>
The parent of level 1 sections cannot be reached.
One reason may be a high level section used in a directive that parses its content into a base node not attached to the document
(up to Docutils 0.21, these sections were silently dropped).
<paragraph>
with content
"""],
# Nested parsing in a block-quote:
["""\
.. nested-current::
Nested parsing is OK but a section is invalid in a block-quote.
nested section
==============
.. nested::
A section in a block-quote is invalid.
invalid section
---------------
.. nested-current::
invalid, too (sic!)
===================
.. nested-section::
The <section> base node is invalid in a block-quote.
The <section> base node is discarded.
invalid section (sic!)
----------------------
""",
"""\
<document source="test data">
<block_quote>
<paragraph>
Nested parsing is OK but a section is invalid in a block-quote.
A section in a block-quote is invalid.
<system_message level="3" line="6" source="test data" type="ERROR">
<paragraph>
Unexpected section title.
<literal_block xml:space="preserve">
nested section
==============
<system_message level="3" line="11" source="test data" type="ERROR">
<paragraph>
Unexpected section title.
<literal_block xml:space="preserve">
invalid section
---------------
<system_message level="3" line="13" source="test data" type="ERROR">
<paragraph>
The "nested-section" directive can only be used where a section is valid.
<literal_block xml:space="preserve">
.. nested-section::
\n\
The <section> base node is invalid in a block-quote.
<section ids="invalid-too-sic" names="invalid,\\ too\\ (sic!)">
<title>
invalid, too (sic!)
<paragraph>
The <section> base node is discarded.
<section ids="invalid-section-sic" names="invalid\\ section\\ (sic!)">
<title>
invalid section (sic!)
<system_message level="2" line="1" source="test data" type="WARNING">
<paragraph>
Element <block_quote> invalid:
Child element <section ids="invalid-too-sic" names="invalid,\\ too\\ (sic!)"> not allowed at this position.
"""],
]