1
0
mirror of https://0xacab.org/jvoisin/mat2 synced 2025-10-06 16:42:57 +02:00

10 Commits

Author SHA1 Message Date
jvoisin
235403bc11 Edit README.md 2025-09-04 15:10:12 +02:00
jvoisin
102f08cd28 Switch the project from 0xacab to github
While the folks running 0xacab are much more lovely than the github ones, this
project has outgrown the former:

- Github offers beefy continuous integration, make it easier to run the
  testsuite on every python version, instead of using a weird docker-based
  contraption. Moreover, I'd rather burn some Microsoft money than 0xacab one.
- Opening an account on 0xacab is non-trivial (by design), making it tedious
  for people to report issues and contribute to mat2.
- Gitlab is becoming unbearably slow and convoluted, even compared to Github's
  awful Copilot/AI push.

It's a sad state of affairs, but it's a pragmatic decision. People who don't
have a Github account can still report issues and send patches by sending me an
email.
2025-09-04 14:35:36 +02:00
jvoisin
7a8ea224bc Fix issue introduced in f073444
The continuous integration on 0xacab didn't run, so it didn't catch this issue.
It seems like we'll have to move to github or whatever instead, sigh.
2025-09-01 23:52:43 +02:00
jvoisin
504efb2448 Remove mypy from the CI
It has always been useless a best, and a nuisance most of the times.
2025-09-01 14:35:25 +02:00
jvoisin
f07344444d Fix a broken test
Reported-By: https://github.com/NixOS/nixpkgs/issues/436421
2025-08-25 12:07:15 +02:00
jvoisin
473903b70e Fix HEIC parsing with the latest exiftool 2025-04-03 17:34:44 +02:00
jvoisin
1438cf7bd4 Disable webp tests for now
```
======================================================================
ERROR: test_all_parametred (tests.test_libmat2.TestCleaning.test_all_parametred) (case={'name': 'webp', 'parser': <class 'libmat2.images.WEBPParser'>, 'meta': {'Warning': '[minor] Improper EXIF header'}, 'expected_meta': {}})
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builds/jvoisin/mat2/libmat2/images.py", line 109, in __init__
    GdkPixbuf.Pixbuf.new_from_file(self.filename)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
gi.repository.GLib.GError: gdk-pixbuf-error-quark: Couldn’t recognize the image file format for file “./tests/data/clean.webp” (3)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/builds/jvoisin/mat2/tests/test_libmat2.py", line 557, in test_all_parametred
    p1 = case['parser'](target)
  File "/builds/jvoisin/mat2/libmat2/images.py", line 111, in __init__
    raise ValueError
ValueError
```

Pending on https://0xacab.org/georg/mat2-ci-images/-/issues/14
2025-04-03 17:34:40 +02:00
jvoisin
e740a9559f Properly handle an exception
```
Traceback (most recent call last):
  File "/builds/jvoisin/mat2/tests/test_deep_cleaning.py", line 147, in test_office
    meta = p.get_meta()
  File "/builds/jvoisin/mat2/libmat2/archive.py", line 155, in get_meta
    zin.extract(member=item, path=temp_folder)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/zipfile/__init__.py", line 1762, in extract
    return self._extract_member(member, path, pwd)
           ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/zipfile/__init__.py", line 1829, in _extract_member
    os.makedirs(upperdirs, exist_ok=True)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen os>", line 227, in makedirs
OSError: [Errno 28] No space left on device: '/tmp/tmptl1ibyv6/word/theme'
```

This should never happen™, but just in case…
2025-04-03 15:24:34 +02:00
Vincent Deffontaines
2b58eece50 Add webp support 2025-03-18 22:20:17 +01:00
georg
29f404bce3 CI: run tests via python3.{13,14} 2025-01-09 09:52:47 +00:00
15 changed files with 149 additions and 342 deletions

45
.github/workflows/builds.yaml vendored Normal file
View File

@@ -0,0 +1,45 @@
name: CI for Python versions
on:
pull_request:
push:
schedule:
- cron: '0 16 * * 5'
jobs:
linting:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: actions/setup-python@v5
- run: pip install ruff
- run: |
ruff check .
build:
needs: linting
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13", "3.14.0-rc.2"]
steps:
- uses: actions/checkout@v5
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
sudo apt-get install --no-install-recommends --no-install-suggests --yes \
ffmpeg \
gir1.2-gdkpixbuf-2.0 \
gir1.2-poppler-0.18 \
gir1.2-rsvg-2.0 \
libimage-exiftool-perl \
python3-gi-cairo \
libcairo2-dev \
libgirepository-2.0-dev \
libgirepository1.0-dev \
gobject-introspection \
python3-mutagen
pip install .
- name: Build and run the testsuite
run: python3 -m unittest discover -v

View File

@@ -1,103 +0,0 @@
variables:
CONTAINER_REGISTRY: $CI_REGISTRY/georg/mat2-ci-images
GIT_DEPTH: "5"
GIT_STRATEGY: clone
stages:
- linting
- test
.prepare_env: &prepare_env
before_script: # This is needed to not run the testsuite as root
- useradd --home-dir ${CI_PROJECT_DIR} mat2
- chown -R mat2 .
linting:ruff:
image: $CONTAINER_REGISTRY:linting
stage: linting
script:
- apt update
- apt install -qqy --no-install-recommends python3-venv
- python3 -m venv venv
- source venv/bin/activate
- pip3 install ruff
- ruff check .
linting:mypy:
image: $CONTAINER_REGISTRY:linting
stage: linting
script:
- mypy --ignore-missing-imports mat2 libmat2/*.py
tests:archlinux:
image: $CONTAINER_REGISTRY:archlinux
stage: test
script:
- python3 -m unittest discover -v
tests:debian:
image: $CONTAINER_REGISTRY:debian
stage: test
<<: *prepare_env
script:
- apt-get -qqy purge bubblewrap
- su - mat2 -c "python3-coverage run --branch -m unittest discover -s tests/"
- su - mat2 -c "python3-coverage report --fail-under=95 -m --include 'libmat2/*'"
tests:debian_with_bubblewrap:
image: $CONTAINER_REGISTRY:debian
stage: test
allow_failure: true
<<: *prepare_env
script:
- apt-get -qqy install bubblewrap
- python3 -m unittest discover -v
tests:fedora:
image: $CONTAINER_REGISTRY:fedora
stage: test
script:
- python3 -m unittest discover -v
tests:gentoo:
image: $CONTAINER_REGISTRY:gentoo
stage: test
<<: *prepare_env
script:
- su - mat2 -c "python3 -m unittest discover -v"
tests:python3.7:
image: $CONTAINER_REGISTRY:python3.7
stage: test
script:
- python3 -m unittest discover -v
tests:python3.8:
image: $CONTAINER_REGISTRY:python3.8
stage: test
script:
- python3 -m unittest discover -v
tests:python3.9:
image: $CONTAINER_REGISTRY:python3.9
stage: test
script:
- python3 -m unittest discover -v
tests:python3.10:
image: $CONTAINER_REGISTRY:python3.10
stage: test
script:
- python3 -m unittest discover -v
tests:python3.11:
image: $CONTAINER_REGISTRY:python3.11
stage: test
script:
- python3 -m unittest discover -v
tests:python3.12:
image: $CONTAINER_REGISTRY:python3.12
stage: test
script:
- python3 -m unittest discover -v

View File

@@ -1,18 +0,0 @@
[FORMAT]
good-names=e,f,i,x,s
max-locals=20
[MESSAGES CONTROL]
disable=
fixme,
invalid-name,
duplicate-code,
missing-docstring,
protected-access,
abstract-method,
wrong-import-position,
catching-non-exception,
cell-var-from-loop,
locally-disabled,
raise-missing-from,
invalid-sequence-index, # pylint doesn't like things like `Tuple[int, bytes]` in type annotation

View File

@@ -1,9 +1,9 @@
# Contributing to mat2 # Contributing to mat2
The main repository for mat2 is on [0xacab]( https://0xacab.org/jvoisin/mat2 ), The main repository for mat2 is on [github]( https://github.com/jvoisin/mat2 ),
but you can send patches to jvoisin by [email](https://dustri.org/) if you prefer. but you can send patches to jvoisin by [email](https://dustri.org/) if you prefer.
Do feel free to pick up [an issue]( https://0xacab.org/jvoisin/mat2/issues ) Do feel free to pick up [an issue]( https://github.com/jvoisin/mat2/issues )
and to send a pull-request. and to send a pull-request.
Before sending the pull-request, please do check that everything is fine by Before sending the pull-request, please do check that everything is fine by
@@ -27,11 +27,11 @@ Since mat2 is written in Python3, please conform as much as possible to the
# Doing a release # Doing a release
1. Update the [changelog](https://0xacab.org/jvoisin/mat2/blob/master/CHANGELOG.md) 1. Update the [changelog](https://github.com/jvoisin/mat2/blob/master/CHANGELOG.md)
2. Update the version in the [mat2](https://0xacab.org/jvoisin/mat2/blob/master/mat2) file 2. Update the version in the [mat2](https://github.com/jvoisin/mat2/blob/master/mat2) file
3. Update the version in the [setup.py](https://0xacab.org/jvoisin/mat2/blob/master/setup.py) file 3. Update the version in the [setup.py](https://github.com/jvoisin/mat2/blob/master/setup.py) file
4. Update the version in the [pyproject.toml](https://0xacab.org/jvoisin/mat2/blob/master/yproject.toml) file 4. Update the version in the [pyproject.toml](https://github.com/jvoisin/mat2/blob/master/yproject.toml) file
5. Update the version and date in the [man page](https://0xacab.org/jvoisin/mat2/blob/master/doc/mat2.1) 5. Update the version and date in the [man page](https://github.com/jvoisin/mat2/blob/master/doc/mat2.1)
6. Commit the modified files 6. Commit the modified files
7. Create a tag with `git tag -s $VERSION` 7. Create a tag with `git tag -s $VERSION`
8. Push the commit with `git push origin master` 8. Push the commit with `git push origin master`
@@ -39,7 +39,7 @@ Since mat2 is written in Python3, please conform as much as possible to the
10. Download the gitlab archive of the release 10. Download the gitlab archive of the release
11. Diff it against the local copy 11. Diff it against the local copy
12. If there is no difference, sign the archive with `gpg --armor --detach-sign mat2-$VERSION.tar.xz` 12. If there is no difference, sign the archive with `gpg --armor --detach-sign mat2-$VERSION.tar.xz`
13. Upload the signature on Gitlab's [tag page](https://0xacab.org/jvoisin/mat2/tags) and add the changelog there 13. Upload the signature on Gitlab's [tag page](https://github.com/jvoisin/mat2/tags) and add the changelog there
14. Announce the release on the [mailing list](https://mailman.boum.org/listinfo/mat-dev) 14. Announce the release on the [mailing list](https://mailman.boum.org/listinfo/mat-dev)
15. Sign'n'upload the new version on pypi with `python3 setup.py sdist bdist_wheel` then `twine upload -s dist/*` 15. Sign'n'upload the new version on pypi with `python3 setup.py sdist bdist_wheel` then `twine upload -s dist/*`
16. Do the secret release dance 16. Do the secret release dance

194
README.md
View File

@@ -1,193 +1 @@
``` # This repository is deprecated, please use https://github.com/jvoisin/mat2 instead
_____ _____ _____ ___
| | _ |_ _|_ | Keep your data,
| | | | |_| | | | | _| trash your meta!
|_|_|_|_| |_| |_| |___|
```
# Metadata and privacy
Metadata consist of information that characterizes data.
Metadata are used to provide documentation for data products.
In essence, metadata answer who, what, when, where, why, and how about
every facet of the data that are being documented.
Metadata within a file can tell a lot about you.
Cameras record data about when a picture was taken and what
camera was used. Office documents like PDF or Office automatically adds
author and company information to documents and spreadsheets.
Maybe you don't want to disclose those information.
This is precisely the job of mat2: getting rid, as much as possible, of
metadata.
mat2 provides:
- a library called `libmat2`;
- a command line tool called `mat2`,
- a service menu for Dolphin, KDE's default file manager
If you prefer a regular graphical user interface, you might be interested in
[Metadata Cleaner](https://metadatacleaner.romainvigier.fr/), which is using
`mat2` under the hood.
# Requirements
- `python3-mutagen` for audio support
- `python3-gi-cairo` and `gir1.2-poppler-0.18` for PDF support
- `gir1.2-gdkpixbuf-2.0` for images support
- `gir1.2-rsvg-2.0` for svg support
- `FFmpeg`, optionally, for video support
- `libimage-exiftool-perl` for everything else
- `bubblewrap`, optionally, for sandboxing
Please note that mat2 requires at least Python3.5.
# Requirements setup on macOS (OS X) using [Homebrew](https://brew.sh/)
```bash
brew install exiftool cairo pygobject3 poppler gdk-pixbuf librsvg ffmpeg
```
# Running the test suite
```bash
$ python3 -m unittest discover -v
```
And if you want to see the coverage:
```bash
$ python3-coverage run --branch -m unittest discover -s tests/
$ python3-coverage report --include -m --include /libmat2/*'
```
# How to use mat2
```
usage: mat2 [-h] [-V] [--unknown-members policy] [--inplace] [--no-sandbox]
[-v] [-l] [--check-dependencies] [-L | -s]
[files [files ...]]
Metadata anonymisation toolkit 2
positional arguments:
files the files to process
optional arguments:
-h, --help show this help message and exit
-V, --verbose show more verbose status information
--unknown-members policy
how to handle unknown members of archive-style files
(policy should be one of: abort, omit, keep) [Default:
abort]
--inplace clean in place, without backup
--no-sandbox Disable bubblewrap's sandboxing
-v, --version show program's version number and exit
-l, --list list all supported fileformats
--check-dependencies check if mat2 has all the dependencies it needs
-L, --lightweight remove SOME metadata
-s, --show list harmful metadata detectable by mat2 without
removing them
```
Note that mat2 **will not** clean files in-place, but will produce, for
example, with a file named "myfile.png" a cleaned version named
"myfile.cleaned.png".
## Web interface
It's possible to run mat2 as a web service, via
[mat2-web](https://0xacab.org/jvoisin/mat2-web).
If you're using WordPress, you might be interested in [wp-mat](https://git.autistici.org/noblogs/wp-mat)
and [wp-mat-server](https://git.autistici.org/noblogs/wp-mat-server).
## Desktop GUI
For GNU/Linux desktops, it's possible to use the
[Metadata Cleaner](https://gitlab.com/rmnvgr/metadata-cleaner) GTK application.
# Supported formats
The following formats are supported: avi, bmp, css, epub/ncx, flac, gif, jpeg,
m4a/mp2/mp3/…, mp4, odc/odf/odg/odi/odp/ods/odt/…, off/opus/oga/spx/…, pdf,
png, ppm, pptx/xlsx/docx/…, svg/svgz/…, tar/tar.gz/tar.bz2/tar.xz/…, tiff,
torrent, wav, wmv, zip, …
# Notes about detecting metadata
While mat2 is doing its very best to display metadata when the `--show` flag is
passed, it doesn't mean that a file is clean from any metadata if mat2 doesn't
show any. There is no reliable way to detect every single possible metadata for
complex file formats.
This is why you shouldn't rely on metadata's presence to decide if your file must
be cleaned or not.
# Notes about the lightweight mode
By default, mat2 might alter a bit the data of your files, in order to remove
as much metadata as possible. For example, texts in PDF might not be selectable anymore,
compressed images might get compressed again, …
Since some users might be willing to trade some metadata's presence in exchange
of the guarantee that mat2 won't modify the data of their files, there is the
`-L` flag that precisely does that.
# Related software
- The first iteration of [MAT](https://mat.boum.org)
- [Exiftool](https://sno.phy.queensu.ca/~phil/exiftool/mat)
- [pdf-redact-tools](https://github.com/firstlookmedia/pdf-redact-tools), that
tries to deal with *printer dots* too.
- [pdfparanoia](https://github.com/kanzure/pdfparanoia), that removes
watermarks from PDF.
- [Scrambled Exif](https://f-droid.org/packages/com.jarsilio.android.scrambledeggsif/),
an open-source Android application to remove metadata from pictures.
- [Dangerzone](https://dangerzone.rocks/), designed to sanitize harmful documents
into harmless ones.
# Contact
If possible, use the [issues system](https://0xacab.org/jvoisin/mat2/issues)
or the [mailing list](https://www.autistici.org/mailman/listinfo/mat-dev)
Should a more private contact be needed (eg. for reporting security issues),
you can email Julien (jvoisin) Voisin at `julien.voisin+mat2@dustri.org`,
using the gpg key `9FCDEE9E1A381F311EA62A7404D041E8171901CC`.
# Donations
If you want to donate some money, please give it to [Tails]( https://tails.boum.org/donate/?r=contribute ).
# License
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU Lesser General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Copyright 2018 Julien (jvoisin) Voisin <julien.voisin+mat2@dustri.org>
Copyright 2016 Marie-Rose for mat2's logo
The `tests/data/dirty_with_nsid.docx` file is licensed under GPLv3,
and was borrowed from the Calibre project: https://calibre-ebook.com/downloads/demos/demo.docx
The `narrated_powerpoint_presentation.pptx` file is in the public domain.
# Thanks
mat2 wouldn't exist without:
- the [Google Summer of Code](https://summerofcode.withgoogle.com/);
- the fine people from [Tails]( https://tails.boum.org);
- friends
Many thanks to them!

View File

@@ -19,14 +19,14 @@ details.
# jpegoptim, optipng, … # jpegoptim, optipng, …
While designed to reduce as much as possible the size of pictures, While designed to reduce as much as possible the size of pictures,
those software can be used to remove metadata. They usually have very good those software can be used to remove metadata. They usually have excellent
support for a single picture format, and can be used in place of mat2 for them. support for a single picture format, and can be used in place of mat2 for them.
# PDF Redact Tools # PDF Redact Tools
[PDF Redact Tools](https://github.com/firstlookmedia/pdf-redact-tools) is [PDF Redact Tools](https://github.com/firstlookmedia/pdf-redact-tools) is
a software developed by the people from [First Look software developed by the people from [First Look
Media](https://firstlook.media/), the entity behind, amongst other things, Media](https://firstlook.media/), the entity behind, amongst other things,
[The Intercept](https://theintercept.com/). [The Intercept](https://theintercept.com/).
@@ -34,13 +34,13 @@ The tool uses roughly the same approach than mat2 to deal with PDF,
which is unfortunately the only fileformat that it does support. which is unfortunately the only fileformat that it does support.
It's interesting to note that it has counter-measures against It's interesting to note that it has counter-measures against
[yellow dots](https://en.wikipedia.org/wiki/Machine_Identification_Code), [yellow dots](https://en.wikipedia.org/wiki/Machine_Identification_Code),
a capacity that mat2 [doesn't possess yet](https://0xacab.org/jvoisin/mat2/issues/43). a capacity that mat2 doesn't have.
# Exiv2 # Exiv2
[Exiv2](https://www.exiv2.org/) was considered for mat2, [Exiv2](https://www.exiv2.org/) was considered for mat2,
but it currently [misses a lot of metadata](https://0xacab.org/jvoisin/mat2/issues/85) but it currently misses a lot of metadata.
# Others non open source software/online service # Others non open source software/online service

View File

@@ -84,7 +84,7 @@ but keep in mind by doing so, some metadata \fBwon't be cleaned\fR.
While mat2 does its very best to remove every single metadata, While mat2 does its very best to remove every single metadata,
it's still in beta, and \fBsome\fR might remain. Should you encounter it's still in beta, and \fBsome\fR might remain. Should you encounter
some issues, check the bugtracker: https://0xacab.org/jvoisin/mat2/issues some issues, check the bugtracker: https://github.com/jvoisin/mat2/issues
.PP .PP
Please use accordingly and be careful. Please use accordingly and be careful.

View File

@@ -152,7 +152,10 @@ class ArchiveBasedAbstractParser(abstract.AbstractParser):
self.filename, member_name, full_path) self.filename, member_name, full_path)
break break
zin.extract(member=item, path=temp_folder) try:
zin.extract(member=item, path=temp_folder)
except OSError as e:
logging.error("Unable to extraxt %s from %s: %s", item, self.filename, e)
os.chmod(full_path, stat.S_IRUSR) os.chmod(full_path, stat.S_IRUSR)

View File

@@ -196,3 +196,15 @@ class HEICParser(exiftool.ExiftoolParser):
def remove_all(self) -> bool: def remove_all(self) -> bool:
return self._lightweight_cleanup() return self._lightweight_cleanup()
class WEBPParser(GdkPixbufAbstractParser):
mimetypes = {'image/webp'}
meta_allowlist = {'SourceFile', 'ExifToolVersion', 'FileName',
'Directory', 'FileSize', 'FileModifyDate',
'FileAccessDate', "FileInodeChangeDate",
'FilePermissions', 'FileType', 'FileTypeExtension',
'MIMEType', 'ImageWidth', 'ImageSize', 'BitsPerSample',
'ColorComponents', 'EncodingProcess', 'JFIFVersion',
'ResolutionUnit', 'XResolution', 'YCbCrSubSampling',
'YResolution', 'Megapixels', 'ImageHeight', 'Orientation',
'HorizontalScale', 'VerticalScale', 'VP8Version'}

View File

@@ -11,9 +11,9 @@ dependencies = [
'pycairo', 'pycairo',
] ]
[project.urls] [project.urls]
Repository = "https://0xacab.org/jvoisin/mat2" Repository = "https://github.com/jvoisin/mat2"
Issues = "https://0xacab.org/jvoisin/mat2/-/issues" Issues = "https://github.com/jvoisin/mat2/issues"
Changelog = "https://0xacab.org/jvoisin/mat2/-/blob/master/CHANGELOG.md" Changelog = "https://github.com/jvoisin/mat2/blob/master/CHANGELOG.md"
[tool.ruff] [tool.ruff]
target-version = "py39" target-version = "py39"

View File

@@ -11,7 +11,7 @@ setuptools.setup(
description="A handy tool to trash your metadata", description="A handy tool to trash your metadata",
long_description=long_description, long_description=long_description,
long_description_content_type="text/markdown", long_description_content_type="text/markdown",
url="https://0xacab.org/jvoisin/mat2", url="https://github.com/jvoisin/mat2",
python_requires = '>=3.5.0', python_requires = '>=3.5.0',
scripts=['mat2'], scripts=['mat2'],
install_requires=[ install_requires=[
@@ -31,6 +31,6 @@ setuptools.setup(
"Intended Audience :: End Users/Desktop", "Intended Audience :: End Users/Desktop",
], ],
project_urls={ project_urls={
'bugtacker': 'https://0xacab.org/jvoisin/mat2/issues', 'bugtacker': 'https://github.com/jvoisin/mat2/issues',
}, },
) )

BIN
tests/data/dirty.webp Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

View File

@@ -236,6 +236,11 @@ class TestGetMeta(unittest.TestCase):
self.assertIn(b'i am a : various comment', stdout) self.assertIn(b'i am a : various comment', stdout)
self.assertIn(b'artist: jvoisin', stdout) self.assertIn(b'artist: jvoisin', stdout)
#def test_webp(self):
# proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.webp'],
# stdout=subprocess.PIPE)
# stdout, _ = proc.communicate()
# self.assertIn(b'Warning: [minor] Improper EXIF header', stdout)
class TestControlCharInjection(unittest.TestCase): class TestControlCharInjection(unittest.TestCase):
def test_jpg(self): def test_jpg(self):

View File

@@ -4,6 +4,7 @@ import unittest
import shutil import shutil
import os import os
import re import re
import sys
import tarfile import tarfile
import tempfile import tempfile
import zipfile import zipfile
@@ -113,6 +114,11 @@ class TestGetMeta(unittest.TestCase):
meta = p.get_meta() meta = p.get_meta()
self.assertEqual(meta['Comment'], 'Created with GIMP') self.assertEqual(meta['Comment'], 'Created with GIMP')
#def test_webp(self):
# p = images.WEBPParser('./tests/data/dirty.webp')
# meta = p.get_meta()
# self.assertEqual(meta['Warning'], '[minor] Improper EXIF header')
def test_ppm(self): def test_ppm(self):
p = images.PPMParser('./tests/data/dirty.ppm') p = images.PPMParser('./tests/data/dirty.ppm')
meta = p.get_meta() meta = p.get_meta()
@@ -333,6 +339,11 @@ class TestCleaning(unittest.TestCase):
'parser': images.JPGParser, 'parser': images.JPGParser,
'meta': {'Comment': 'Created with GIMP'}, 'meta': {'Comment': 'Created with GIMP'},
'expected_meta': {}, 'expected_meta': {},
#}, {
# 'name': 'webp',
# 'parser': images.WEBPParser,
# 'meta': {'Warning': '[minor] Improper EXIF header'},
# 'expected_meta': {},
}, { }, {
'name': 'wav', 'name': 'wav',
'parser': audio.WAVParser, 'parser': audio.WAVParser,
@@ -526,7 +537,40 @@ class TestCleaning(unittest.TestCase):
'parser': images.HEICParser, 'parser': images.HEICParser,
'meta': {}, 'meta': {},
'expected_meta': { 'expected_meta': {
'BlueMatrixColumn': '0.14305 0.06061 0.71393',
'BlueTRC': '(Binary data 32 bytes, use -b option to extract)',
'CMMFlags': 'Not Embedded, Independent',
'ChromaticAdaptation': '1.04788 0.02292 -0.05022 0.02959 0.99048 -0.01707 -0.00925 0.01508 0.75168',
'ChromaticityChannel1': '0.64 0.33002',
'ChromaticityChannel2': '0.3 0.60001',
'ChromaticityChannel3': '0.15001 0.06',
'ChromaticityChannels': 3,
'ChromaticityColorant': 'Unknown',
'ColorSpaceData': 'RGB ',
'ConnectionSpaceIlluminant': '0.9642 1 0.82491',
'DeviceAttributes': 'Reflective, Glossy, Positive, Color',
'DeviceManufacturer': '',
'DeviceMfgDesc': 'GIMP',
'DeviceModel': '',
'DeviceModelDesc': 'sRGB',
'ExifByteOrder': 'Big-endian (Motorola, MM)', 'ExifByteOrder': 'Big-endian (Motorola, MM)',
'GreenMatrixColumn': '0.38512 0.7169 0.09706',
'GreenTRC': '(Binary data 32 bytes, use -b option to extract)',
'MediaWhitePoint': '0.9642 1 0.82491',
'PrimaryPlatform': 'Apple Computer Inc.',
'ProfileCMMType': 'Little CMS',
'ProfileClass': 'Display Device Profile',
'ProfileConnectionSpace': 'XYZ ',
'ProfileCopyright': 'Public Domain',
'ProfileCreator': 'Little CMS',
'ProfileDateTime': '2022:05:15 16:29:22',
'ProfileDescription': 'GIMP built-in sRGB',
'ProfileFileSignature': 'acsp',
'ProfileID': 0,
'ProfileVersion': '4.3.0',
'RedMatrixColumn': '0.43604 0.22249 0.01392',
'RedTRC': '(Binary data 32 bytes, use -b option to extract)',
'RenderingIntent': 'Perceptual',
'Warning': 'Bad IFD0 directory', 'Warning': 'Bad IFD0 directory',
}, },
} }
@@ -563,13 +607,13 @@ class TestCleaning(unittest.TestCase):
meta = p2.get_meta() meta = p2.get_meta()
if meta: if meta:
for k, v in p2.get_meta().items(): for k, v in p2.get_meta().items():
self.assertIn(k, case['expected_meta'], '"%s" is not in "%s" (%s)' % (k, case['expected_meta'], case['name'])) self.assertIn(k, case['expected_meta'], '"%s" is not in "%s" (%s), with all of them being %s' % (k, case['expected_meta'], case['name'], p2.get_meta().items()))
if str(case['expected_meta'][k]) in str(v): if str(case['expected_meta'][k]) in str(v):
continue continue
if 'extra_expected_meta' in case and k in case['extra_expected_meta']: if 'extra_expected_meta' in case and k in case['extra_expected_meta']:
if str(case['extra_expected_meta'][k]) in str(v): if str(case['extra_expected_meta'][k]) in str(v):
continue continue
self.assertTrue(False, "got a different value (%s) than excepted (%s) for %s" % (str(v), meta, k)) self.assertTrue(False, "got a different value (%s) than excepted (%s) for %s, with all of them being %s" % (str(v), meta, k, p2.get_meta().items()))
self.assertTrue(p2.remove_all()) self.assertTrue(p2.remove_all())
os.remove(target) os.remove(target)
@@ -595,14 +639,20 @@ class TestCleaning(unittest.TestCase):
os.remove('./tests/data/clean.cleaned.html') os.remove('./tests/data/clean.cleaned.html')
os.remove('./tests/data/clean.cleaned.cleaned.html') os.remove('./tests/data/clean.cleaned.cleaned.html')
with open('./tests/data/clean.html', 'w') as f: if sys.version_info >= (3, 13):
f.write('<title><title><pouet/><meta/></title></title><test/>') with open('./tests/data/clean.html', 'w') as f:
p = web.HTMLParser('./tests/data/clean.html') f.write('<title><title><pouet/><meta/></title></title><test/>')
self.assertTrue(p.remove_all()) with self.assertRaises(ValueError):
with open('./tests/data/clean.cleaned.html', 'r') as f: p = web.HTMLParser('./tests/data/clean.html')
self.assertEqual(f.read(), '<title></title><test/>') else:
with open('./tests/data/clean.html', 'w') as f:
f.write('<title><title><pouet/><meta/></title></title><test/>')
p = web.HTMLParser('./tests/data/clean.html')
self.assertTrue(p.remove_all())
with open('./tests/data/clean.cleaned.html', 'r') as f:
self.assertEqual(f.read(), '<title></title><test/>')
os.remove('./tests/data/clean.cleaned.html')
os.remove('./tests/data/clean.html') os.remove('./tests/data/clean.html')
os.remove('./tests/data/clean.cleaned.html')
with open('./tests/data/clean.html', 'w') as f: with open('./tests/data/clean.html', 'w') as f:
f.write('<test><title>Some<b>metadata</b><br/></title></test>') f.write('<test><title>Some<b>metadata</b><br/></title></test>')

View File

@@ -23,6 +23,11 @@ class TestLightWeightCleaning(unittest.TestCase):
'parser': images.JPGParser, 'parser': images.JPGParser,
'meta': {'Comment': 'Created with GIMP'}, 'meta': {'Comment': 'Created with GIMP'},
'expected_meta': {}, 'expected_meta': {},
#}, {
# 'name': 'webp',
# 'parser': images.WEBPParser,
# 'meta': {'Warning': '[minor] Improper EXIF header'},
# 'expected_meta': {},
}, { }, {
'name': 'torrent', 'name': 'torrent',
'parser': torrent.TorrentParser, 'parser': torrent.TorrentParser,