No Story: Making etree ignore encoding

When we pass the full PPR template to the parameter function, it includes the XML header which defines the encoding that is making etree upset. This trims the header, adds the parameters, then puts the header back before saving the PPR.

Demo:

>>> from lxml import etree
>>> template = """<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
... <ns2:SciPipeRequest xmlns:ns2="Common/pipelinescience/SciPipeRequest">
...     <ProjectSummary>
...         <ProposalCode>VLA/{{projectCode}}</ProposalCode>
...         <Observatory>NRAO</Observatory>
...         <Telescope>VLA</Telescope>
...         <ProcessingSite>Socorro</ProcessingSite>
...         <Operator>vlapipe</Operator>
...         <Mode>SCIENCE</Mode>
...         <Version>NGRH-ALMA-10_8</Version>
...         <CreationTime>{{created_at}}</CreationTime>
...     </ProjectSummary>
...     <ProjectStructure>TBD</ProjectStructure>
...     <ProcessingRequests>
...         <RootDirectory>{{root_directory}}</RootDirectory>
...         <ProcessingRequest>
...             <ProcessingIntents/>
...             {{#casa_recipe}}{{{.}}}{{/casa_recipe}}
...             {{^casa_recipe}}
...             <ProcessingProcedure>
...                 <ProcedureTitle>hifv_calimage_cont_cube_selfcal</ProcedureTitle>
...                 <ProcessingCommand>
...                     <Command xmlns="">hifv_importdata</Command>
...                     <ParameterSet>
...                     </ParameterSet>
...                 </ProcessingCommand>
...             </ProcessingProcedure>
...         </ProcessingRequest>
...     </ProcessingRequests>
... </ns2:SciPipeRequest>
... """
>>> subset = template.split("\n",1)[1]
>>> print(subset)
<ns2:SciPipeRequest xmlns:ns2="Common/pipelinescience/SciPipeRequest">
    <ProjectSummary>
        <ProposalCode>VLA/{{projectCode}}</ProposalCode>
        <Observatory>NRAO</Observatory>
        <Telescope>VLA</Telescope>
        <ProcessingSite>Socorro</ProcessingSite>
        <Operator>vlapipe</Operator>
        <Mode>SCIENCE</Mode>
        <Version>NGRH-ALMA-10_8</Version>
        <CreationTime>{{created_at}}</CreationTime>
    </ProjectSummary>
    <ProjectStructure>TBD</ProjectStructure>
    <ProcessingRequests>
        <RootDirectory>{{root_directory}}</RootDirectory>
        <ProcessingRequest>
            <ProcessingIntents/>
            {{#casa_recipe}}{{{.}}}{{/casa_recipe}}
            {{^casa_recipe}}
            <ProcessingProcedure>
                <ProcedureTitle>hifv_calimage_cont_cube_selfcal</ProcedureTitle>
                <ProcessingCommand>
                    <Command xmlns="">hifv_importdata</Command>
                    <ParameterSet>
                    </ParameterSet>
                </ProcessingCommand>
            </ProcessingProcedure>
        </ProcessingRequest>
    </ProcessingRequests>
</ns2:SciPipeRequest>

>>> root = etree.fromstring(subset)
>>> command_name = "hifv_importdata"
>>> command_element = root.xpath(f'.//Command[text()="{command_name}"]')[0]
>>> print(command_element.text)
hifv_importdata

And here's reproducing the error when trying to use etree with the full template:

>>> etree.fromstring(template)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "src/lxml/etree.pyx", line 3307, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1990, in lxml.etree._parseMemoryDocument
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
Edited by Daniel Nemergut

Merge request reports

Loading