1
0
mirror of https://0xacab.org/jvoisin/mat2 synced 2025-10-06 16:42:57 +02:00

179 Commits
0.1.3 ... 0.7.0

Author SHA1 Message Date
jvoisin
6b45064c78 Bump the changelog 2019-02-17 17:02:17 +01:00
jvoisin
a81b7658a8 Make the mandatory metadata warning generic
This should close #95.
2019-02-10 21:46:13 +01:00
jvoisin
6e63e03b86 Streamline a bit the previous commit 2019-02-09 15:23:16 +01:00
Poncho
a71488d459 bind mount /etc/ld.so.cache to the sandbox
without /etc/ld.so.cache available in the sandbox, tests fail on gentoo with:
/usr/bin/ffmpeg: error while loading shared libraries: libstdc++.so.6:
    cannot open shared object file: No such file or directory
2019-02-09 09:49:51 +01:00
jvoisin
6ef6aaa222 Improve a bit get_meta for libreoffice files 2019-02-08 23:23:56 +01:00
jvoisin
6cc034e81b Add support for html files 2019-02-08 23:05:18 +01:00
jvoisin
e1dd439fc8 Use of the archive refactoring for the office documents too 2019-02-07 22:19:37 +01:00
jvoisin
b9a62d798a Refactor a bit office get_meta handling
This should make easier to get more metadata from
archive-based file formats.
2019-02-04 00:31:26 +01:00
jvoisin
54e50450ad Fix the return code on parsers' list display 2019-02-03 21:09:12 +01:00
jvoisin
433609f8ea Implement .gif support 2019-02-03 21:01:58 +01:00
intrigeri
e8c1bb0e3c Whenever possible, use bwrap for subprocesses
This should closes  #90
2019-02-03 19:18:41 +01:00
jvoisin
8b5d0c286c Document how to get the coverage from the testsuite 2019-02-03 18:33:25 +01:00
jvoisin
8e84ba547a Add support for wmv 2019-02-02 19:19:36 +01:00
jvoisin
812bf2553b Rename the internal class used by the nautilus extension
This should solve collisions with people like me that
are copy/pasting the documentation, creating conflicts
with other extensions that are doing the very same thing.
2019-01-16 23:10:17 +01:00
Alan
94cdca1ed2 Update debian packaging status 2018-12-15 17:05:37 +01:00
Alan
b755aba8ea Fix debian build instructions 2018-12-15 17:05:32 +01:00
jvoisin
edce78859b Add a note in the readme about -L and pdf 2018-12-08 18:39:56 +01:00
jvoisin
0ab17b973b mat2 is now available on pypi 2018-11-11 20:49:24 +01:00
jvoisin
389311475c Add a readme for the nautilus extension 2018-11-11 19:58:51 +01:00
jvoisin
505be24be9 Bump the changelog 2018-11-10 12:46:31 +01:00
jvoisin
ef8265e86a Remove a useless image 2018-11-10 10:54:13 +01:00
jvoisin
1d75451b77 Add some type annotations to the nautilus extension 2018-11-08 21:40:33 +01:00
jvoisin
dc35ef56c8 Add a missing file :/ 2018-11-07 22:20:31 +01:00
jvoisin
3aa76cc58e Prove that the previous commit is working 2018-11-07 22:13:36 +01:00
jvoisin
8ff57c5803 Do not display control characters in output
Kudos to Sherry Taylor for reporting this issue ♥
2018-11-07 22:07:46 +01:00
jvoisin
04bb8c8ccf Add mp4 support 2018-10-28 07:41:04 -07:00
jvoisin
3a070b0ab7 Add support for zip files 2018-10-25 11:56:46 +02:00
jvoisin
283e5e5787 Improve archive-based parser's robustness against corrupted embedded files 2018-10-25 11:56:12 +02:00
jvoisin
513d897ea0 Implement get_meta() for archives 2018-10-25 11:29:50 +02:00
jvoisin
5a9dc388ad Minor refactorisation of how we're checking for exiftool's presence 2018-10-25 11:05:06 +02:00
jvoisin
5a08f5b7bf Add a test for tiff lightweight cleaning 2018-10-24 20:19:36 +02:00
jvoisin
fe885babee Implement lightweight cleaning for jpg 2018-10-24 19:35:07 +02:00
jvoisin
1040a594d6 Fix a stupid typo in the changelog 2018-10-23 17:13:53 +02:00
jvoisin
e510a225e3 Bump the changelog 2018-10-23 17:07:42 +02:00
jvoisin
a98962a0fa Document that FFmpeg is now an optional dependency 2018-10-23 16:57:18 +02:00
jvoisin
9a81b3adfd Improve type annotation coverage 2018-10-23 16:32:28 +02:00
jvoisin
f1a071d460 Implement lightweight cleaning for png and tiff 2018-10-23 16:22:11 +02:00
jvoisin
38df679a88 Optimize the handling of problematic files 2018-10-23 13:49:58 +02:00
jvoisin
44f267a596 Improve problematic filenames support 2018-10-22 16:56:05 +02:00
jvoisin
5bc88faedf Fix the testsuite on fedora 2018-10-22 13:55:09 +02:00
jvoisin
83389a63e9 Test mat2's reliability wrt. corrupted video files 2018-10-22 13:42:04 +02:00
jvoisin
e70ea811c9 Implement support for .avi files, via ffmpeg
- This commit introduces optional dependencies (namely ffmpeg):
  mat2 will spit a warning when trying to process an .avi file
  if ffmpeg isn't installed.
- Since metadata are obtained via exiftool, this commit
  also refactors a bit our exfitool wrapper.
2018-10-22 12:58:01 +02:00
jvoisin
2ae5d909c3 Make pyflakes happy 2018-10-18 21:22:28 +02:00
jvoisin
5896387ade Output metadata in a sorted fashion 2018-10-18 21:17:12 +02:00
jvoisin
d4c050a738 wtf python 2018-10-18 20:29:50 +02:00
jvoisin
f04d4b28fc Fix the tests on Debian? 2018-10-18 20:23:00 +02:00
jvoisin
da88d30689 Fix the CI on debian 2018-10-14 10:59:50 +02:00
Rémi Oudin
f1552b2ccb Make testsuite fail if coverage is under 100%
Fixes issue #61
2018-10-12 17:07:56 +02:00
jvoisin
2ba38dd2a1 Bump mypy typing coverage 2018-10-12 14:32:09 +02:00
jvoisin
b832a59414 Refactor lightweight mode implementation 2018-10-12 11:49:24 +02:00
Sébastien Helleu
6ce88b8b7f Fix typo in README 2018-10-11 21:40:58 +02:00
jvoisin
2444caccc0 Make pylint happier 2018-10-11 19:55:07 +02:00
jvoisin
b9dbd12ef9 Implement recursive metadata for FLAC files
Since FLAC files can contain covers, it makes sense
to parse their metadata
2018-10-11 19:52:47 +02:00
jvoisin
b2e153b69c Delete pictures of FLAC files 2018-10-11 18:15:11 +02:00
Simon Magnin
35dca4bf1c add recursivity for archive style files 2018-10-11 08:28:02 -07:00
jvoisin
4ed30b5e00 Add the mailing list announcement to the release process 2018-10-06 20:00:50 +02:00
jvoisin
0d25b18d26 Improve both the typing and the comments 2018-10-05 17:07:58 +02:00
jvoisin
d0f3534eff Hide unsupported extensions in mat2 -l 2018-10-05 12:43:21 +02:00
jvoisin
8675706c93 Improve the display of mat2 when no metadata are found
This should close #74
2018-10-05 12:35:35 +02:00
Poncho
5e196ecef8 Update logo
Use color palette an size according to
https://developer.gnome.org/hig/stable/icon-design.html.en
2018-10-05 11:13:31 +02:00
jvoisin
8e98593b02 Trash word/people.xml in office files 2018-10-04 16:28:20 +02:00
jvoisin
df252fd71a Remove a superfluous import 2018-10-04 16:19:38 +02:00
jvoisin
a1c39104fc Make the testsuite runnable on the installed MAT2 2018-10-04 16:16:52 +02:00
georg
34fbd633fd libmat2: fix shebang
Relates 0a2a398c9c
2018-10-03 18:38:28 +00:00
jvoisin
f1ceed13b5 Bump the changelog 2018-10-03 16:38:05 +02:00
jvoisin
5a5c642a46 Don't break office files for MS Office
We didn't take the whitelist into account while
removing dangling files from [Content_types].xml
2018-10-03 16:38:05 +02:00
jvoisin
84e302ac93 Remove file left behind by the testsuite 2018-10-03 16:38:05 +02:00
jvoisin
7901fdef2e Fix the testsuite 2018-10-03 15:29:46 +02:00
jvoisin
1b356b8c6f Improve mat2's cli reliability
- Replace some class members by instance members
- Don't thread the cleaning process anymore for now
2018-10-03 15:22:36 +02:00
jvoisin
c67bbafb2c Use [Content_Types].xml to improve MS Office coverage 2018-10-02 11:55:42 -07:00
georg
5b606f939d fix typo 2018-10-02 16:01:24 +00:00
jvoisin
156e81fb4c Check that cleaning twice doesn't break the file 2018-10-02 16:05:51 +02:00
jvoisin
9578e4b4ee Silence a bit the testsuite 2018-10-02 15:26:13 +02:00
jvoisin
a46a7eb6fa Update the CONTRIBUTING.md file wrt. to the previous commit 2018-10-02 11:12:50 +02:00
georg
a24c59b208 manpage: this is about mat2, not mat 2018-10-01 21:26:59 +00:00
jvoisin
652b8e519f Files processed via MAT2 are now accepted without warnings by MS Office 2018-10-01 12:25:37 -07:00
jvoisin
c14be47f95 Fix a typo in the README spotted by @georg 2018-10-01 15:51:22 +02:00
jvoisin
81a3881aa4 Please mypy 2018-09-30 19:55:17 +02:00
jvoisin
e342671ead Remove dangling references in MS Office's [Content_types].xml 2018-09-30 19:53:18 +02:00
jvoisin
212d9c472c Document mat2's output scheme in the manpage as well 2018-09-26 00:13:44 +02:00
jvoisin
a88107c9ca Document the output scheme in the README 2018-09-26 00:11:16 +02:00
jvoisin
7f629ed2e3 Run the testsuite exclusively on Whitewhale for now
This should fix the intermittent failures, thanks
to @pollo for the tip
2018-09-25 17:09:04 +02:00
jvoisin
719cdf20fa Second pass of minor formatting 2018-09-24 20:15:07 +02:00
jvoisin
2e243355f5 Fix some minor formatting issues 2018-09-24 19:50:24 +02:00
jvoisin
174d4a0ac0 Implement rsid stripping for office files
MS Office XML rsid is a "unique identifier used to track the editing session
when the physical character representing this section mark was last formatted."

See the following links for details:
- https://msdn.microsoft.com/en-us/library/office/documentformat.openxml.wordprocessing.previoussectionproperties.rsidrpr.aspx
- https://blogs.msdn.microsoft.com/brian_jones/2006/12/11/whats-up-with-all-those-rsids/.
2018-09-24 18:03:59 +02:00
jvoisin
fbcf68c280 Lexicographical sort on xml attributes for office files
In XML, the order of the attributes shouldn't be meaningful,
however, MS Office sorts attributes for a given XML tag
differently than LibreOffice.
2018-09-24 17:45:09 +02:00
jvoisin
9826de3526 Add a test for zip ordering 2018-09-20 14:04:46 +02:00
jvoisin
ab71c29a28 Make pyflakes happy 2018-09-20 01:19:22 +02:00
jvoisin
3d2842802c Split the tests 2018-09-20 01:13:59 +02:00
jvoisin
a1a06d023e Insert archive members in lexicographic order 2018-09-18 22:44:21 +02:00
jvoisin
9275d64be5 Add a link to the gentoo overlay 2018-09-17 21:11:48 +02:00
Yoann Lamouroux
0a2a398c9c trivial modification of all shebang.
`/usr/bin/python3` -> `/usr/bin/env python3`

It's always better to trust the environment defined path to bin/python, as
virtualenv become the way to go.
2018-09-12 14:58:27 +02:00
jvoisin
5cf94bd256 Bump coverage back to 100% 2018-09-12 14:54:54 +02:00
jvoisin
de65f4f4d4 Improve the resilience of MAT2 wrt. corrupted PNG 2018-09-09 19:09:05 +02:00
jvoisin
759efa03ee Fix a setuptool-related warning 2018-09-06 11:42:07 +02:00
jvoisin
9fe6f1023b Make pylint happy 2018-09-06 11:36:04 +02:00
jvoisin
e3d817f57e Split office and archives 2018-09-06 11:34:14 +02:00
jvoisin
2e9adab86a Improve a cli test resilience 2018-09-06 11:32:29 +02:00
jvoisin
c8c27dcf38 Mention "scambled exif" as a related software 2018-09-06 11:20:08 +02:00
jvoisin
120b204988 Change a bit the previous commit 2018-09-06 11:13:11 +02:00
Daniel Kahn Gillmor
f3cef319b9 Unknown Members: make policy use an Enum
Closes #60

Note: this changeset also ensures that clean.cleaned.docx is removed
up after the pytest is over.
2018-09-05 18:59:33 -04:00
Daniel Kahn Gillmor
2d9ba81a84 spelling correction.
while mat2 has both a thread model (a thread pool that strips metadata
in parallel) and a threat model (a list of malicious adversaries and
their capabilities that we are trying to defeat), i think this
paragraph is talking about the latter.
2018-09-05 13:00:28 -04:00
jvoisin
072ee1814d Remove defusedxml support and document why 2018-09-05 18:41:08 +02:00
jvoisin
3649c0ccaf Remove short version of dangerous/advanced options 2018-09-05 17:48:14 +02:00
Christian
119085f28d Add missing dependencies for the Nautilus extension to INSTALL.md 2018-09-05 17:42:39 +02:00
Christian
e515d907d7 Make sure target directory exists, assume MAT2 is in parent directory 2018-09-05 17:42:13 +02:00
jvoisin
46bb1b83ea Improve the previous commit 2018-09-05 17:26:09 +02:00
Daniel Kahn Gillmor
1d7e374e5b office: try all members, even when one fails
the end result will be the same -- an abort -- but the user will get
to see all the warnings for a particular file, instead of getting them
one at a time.
2018-09-04 18:28:04 -04:00
Daniel Kahn Gillmor
915dc634c4 document all unknown/unhandlable files even on abort
This makes it easy to get a list of all files that mat2 doesn't know
how to handle, without having to choose -u keep or -u omit.
2018-09-04 18:28:04 -04:00
Daniel Kahn Gillmor
10d60bd398 add --unknown-members argument to mat2
This allows the user to make use of parser.unknown_member_policy for
archive formats.

At the suggestion of @jvoisin, it also prints a scary warning if the
user explicitly chooses 'keep'.
2018-09-04 18:28:04 -04:00
Daniel Kahn Gillmor
4192a2daa3 office: create policy for what to do about unknown members
previously, encountering an unknown member meant that any parser of
this type would abort.

now, the user can set parser.unknown_member_policy to either 'omit' or
'keep' if they don't want the current action of 'abort'

note that this causes pylint to complain about branching depth for
remove_all() because of the nuanced error-handling.  I've disabled
this check.
2018-09-04 16:13:33 -04:00
jvoisin
9ce458cb3b Update the release process to create signed tarballs 2018-09-03 14:28:00 +02:00
jvoisin
907fc591cc Bump the coverage back to 100% 2018-09-01 16:58:34 +02:00
jvoisin
8255293d1d Add a link to the mailing list 2018-09-01 16:45:20 +02:00
jvoisin
6b7e8ad8c0 Add a .mailmap file 2018-09-01 16:12:03 +02:00
jvoisin
b7a8622682 Bump the changelog 2018-09-01 16:00:41 +02:00
Daniel Kahn Gillmor
3e2890eb9e three minor spelling fixes 2018-09-01 06:47:22 -07:00
jvoisin
91e80527fc Add archlinux to the CI 2018-09-01 15:41:22 +02:00
jvoisin
7877ba0da5 Fix a minor formatting issue 2018-09-01 14:16:55 +02:00
dkg
e2634f7a50 Logging cleanup 2018-09-01 05:14:32 -07:00
jvoisin
aba9b72d2c Fix some leftovers from the previous commit 2018-08-26 01:10:48 +02:00
Antoine Tenart
15dd3d84ff nautilus: rename the nautilus plugin
Rename the Nautilus plugin (removing 'nautilus' from the file name) as
it already lives in its own 'nautilus' directory. The same argument
applies when installing the plugin in a distro.

Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-08-26 01:09:41 +02:00
Antoine Tenart
588466f4a8 INSTALL: add instructions for the Fedora copr
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-08-24 18:47:39 +02:00
Antoine Tenart
cf89ff45c2 gitignore: exclude all hidden files from being committed
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-08-24 09:14:05 +02:00
Antoine Tenart
f583d12564 nautilus: remove swp file
A .swp file was committed by mistake. Remove it.

Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-08-24 09:09:49 +02:00
jvoisin
1c72448e58 Improve the detection of unsupported extensions in uppercase 2018-08-23 21:28:37 +02:00
Antoine Tenart
f068621628 libmat2: images: fix handling of .JPG files
Pixbuf only supports .jpeg files, not .jpg, so libmat2 looks for such an
extension and converts it if necessary. As this check is case sensitive,
processing .JPG files does not work.

Fixes #47.

Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-08-23 20:43:27 +02:00
jvoisin
fe09d81ab1 Don't forget to tell the downstreams about new releases 2018-08-19 15:51:44 +02:00
jvoisin
5be66dbe91 Mention the Arch linux's AUR package of MAT2 2018-08-19 15:51:23 +02:00
jvoisin
ee496cfa7f Fix a typo spotted by @Francois_B 2018-08-19 15:51:09 +02:00
jvoisin
6e2e411a2a Add an INSTALL.md file 2018-08-08 20:45:09 +02:00
jvoisin
2ce1dc793e Bump the changelog 2018-08-03 22:20:24 +02:00
jvoisin
e27768824a Change mat2's logo 2018-08-03 21:45:41 +02:00
jvoisin
36c5bad140 Improve our .gitignore 2018-07-30 23:00:33 +02:00
jvoisin
b5a9520a60 Add a cli-related test 2018-07-30 22:54:41 +02:00
jvoisin
a1257c538b Add some tests about pathological files 2018-07-30 22:36:36 +02:00
Antoine Tenart
6d8e999f12 Rename image to icon in the Nautilus extension
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-07-26 09:01:27 +02:00
Antoine Tenart
1bc4c7aac9 Switch columns in the Nautilus extension
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-07-26 09:01:01 +02:00
Antoine Tenart
03245a8731 Rename the Nautilus path column to file
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-07-26 08:57:33 +02:00
Antoine Tenart
27445e9134 Rename the Nautilus exit button to close
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-07-26 08:57:09 +02:00
jvoisin
b32ba9f736 Improve a bit nautilus' popup 2018-07-25 22:48:05 +02:00
jvoisin
e9f28edf73 Add a man page and document how to keep it up to date 2018-07-24 22:34:33 +02:00
jvoisin
7697f9c085 Improve the linters' coverage 2018-07-23 23:55:45 +02:00
jvoisin
e920083559 The Nautilus extension is now working 2018-07-23 23:39:06 +02:00
georg
71b1ced842 AbstractParser: Fix typos 2018-07-21 00:46:48 +00:00
jvoisin
942859601d Improve the code's documentation 2018-07-19 23:10:27 +02:00
jvoisin
565cb66d14 Minor simplification in how we're handling xml for office files 2018-07-19 22:55:08 +02:00
jvoisin
052a356750 Implement a much better Nautilus extension thanks to @atenart
Co-authored-by: Antoine Tenart <antoine.tenart@ack.tf>
Co-authored-by: jvoisin <julien.voisin@dustri.org>
2018-07-19 00:11:30 +02:00
jvoisin
2f670651cf Minor cleanup of the Nautilus extension's code 2018-07-18 23:20:51 +02:00
jvoisin
0cd510938a Minor code simplification 2018-07-18 23:15:47 +02:00
jvoisin
dc026f99ad Show if files are supported or not in the Nautilus extension 2018-07-18 23:12:55 +02:00
jvoisin
0aac0d644d Show a pretty icon for files in the Nautilus extension 2018-07-18 22:53:56 +02:00
jvoisin
17e69b6005 Change a button in the nautilus extension 2018-07-18 22:39:18 +02:00
jvoisin
cf5f3b268d Add a separator for the Nautilus extension 2018-07-18 22:39:10 +02:00
jvoisin
a5eede9a21 Remove the disclaimer from the Nautilus extension 2018-07-18 22:38:42 +02:00
Antoine Tenart
926e8dac5f nautilus: first working version
Improve the nautilus extension to get to a first working version:
- Single and multiple selections are working.
- The menu item only is there if mat2 has a chance to work on the
  selected files.
- Errors are reported using notifications.

Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-07-18 22:38:05 +02:00
georg
edc5f86552 README: Fix typo 2018-07-16 15:09:22 +00:00
jvoisin
84d50f97c0 Add a check for a missed dependency in ./mat2 -c 2018-07-15 17:00:01 +02:00
jvoisin
8093dce88e Bump the changelog 2018-07-10 21:41:24 +02:00
jvoisin
5a7c7f35f7 Remove print from libmat, and use the logging module instead
This should close #28
2018-07-10 21:30:38 +02:00
jvoisin
d5861e4653 Implement a check for dependencies in mat2
Example use:

```
$ mat2 -c
Dependencies required for MAT2 0.1.3:
- Cairo: yes
- Exiftool: yes
- GdkPixbuf from PyGobject: yes
- Mutagen: yes
- Poppler from PyGobject: yes
- PyGobject: yes
```

This should close #35
2018-07-10 21:24:26 +02:00
jvoisin
22e3918f67 Add pylint3 to the ci 2018-07-09 01:22:08 +02:00
jvoisin
080d6769ca Make pylint even happier 2018-07-09 01:11:44 +02:00
jvoisin
86fe3aa584 Fix the previous commit 2018-07-09 00:30:16 +02:00
jvoisin
cc327b1592 Minor improvement of fedora's duration in the testsuite 2018-07-09 00:27:40 +02:00
jvoisin
b4edd6d2a2 Document that MAT2 not being able to detect metadata doesn't mean that the file is clean 2018-07-09 00:17:59 +02:00
jvoisin
bd357b85f8 Remove a useless option that was never implemented anyway 2018-07-09 00:13:16 +02:00
jvoisin
8c21006e6c Fix some pep8 issues spotted by pyflakes 2018-07-08 22:40:36 +02:00
jvoisin
f49aa5cab7 Achieve 100% coverage! 2018-07-08 22:27:37 +02:00
jvoisin
52a2c800b7 Bump coverage again 2018-07-08 21:50:52 +02:00
jvoisin
ad3e7ccee8 Bump coverage for office files and fix some related crashes 2018-07-08 21:35:45 +02:00
jvoisin
ca01484126 Silence a mypy's stupid warning 2018-07-08 17:12:17 +02:00
jvoisin
f9bc022c96 Add defusedxml as an (optional) way to prevent XML-based attacks
Those attacks are DoS-only.
2018-07-08 17:07:26 +02:00
jvoisin
72e1fda18d Remove a leftover print 2018-07-08 15:19:18 +02:00
jvoisin
3cd4f9111f Bump coverage for torrent handling 2018-07-08 15:13:03 +02:00
jvoisin
b5fcddd6a6 Simplify how torrent files are handled
- Rework the testsuite wrt. torrent
- fail at parser's instantiation on corrupted torrent,
  instead of during `get_meta` or `remove_all` call
2018-07-08 13:49:11 +02:00
jvoisin
7ea362d908 Bump the coverage for pdf 2018-07-07 18:12:33 +02:00
jvoisin
85455a4419 Fix a mistake in office file revisions handling 2018-07-07 18:05:54 +02:00
jvoisin
9f631a1bb1 Bump a bit the coverage 2018-07-07 18:02:53 +02:00
51 changed files with 3231 additions and 591 deletions

4
.gitignore vendored
View File

@@ -1,5 +1,9 @@
.*
*.pyc
.coverage
.eggs
.mypy_cache/
build
dist
mat2.egg-info
tags

View File

@@ -9,14 +9,25 @@ bandit:
script: # TODO: remove B405 and B314
- apt-get -qqy update
- apt-get -qqy install --no-install-recommends python3-bandit
- bandit ./mat2 --format txt --skip B101
- bandit -r ./nautilus/ --format txt --skip B101
- bandit -r ./libmat2 --format txt --skip B101,B404,B603,B405,B314
pylint:
stage: linting
script:
- apt-get -qqy update
- apt-get -qqy install --no-install-recommends pylint3 python3-mutagen python3-gi-cairo gir1.2-poppler-0.18 gir1.2-gdkpixbuf-2.0
- pylint3 --extension-pkg-whitelist=cairo,gi ./libmat2 ./mat2
# Once nautilus-python is in Debian, decomment it form the line below
- pylint3 --extension-pkg-whitelist=Nautilus,GObject,Gtk,Gio,GLib,gi ./nautilus/mat2.py
pyflakes:
stage: linting
script:
- apt-get -qqy update
- apt-get -qqy install --no-install-recommends pyflakes3
- pyflakes3 ./libmat2 ./mat2 ./tests/
- pyflakes3 ./libmat2 ./mat2 ./tests/ ./nautilus
mypy:
stage: linting
@@ -24,23 +35,42 @@ mypy:
- apt-get -qqy update
- apt-get -qqy install --no-install-recommends python3-pip
- pip3 install mypy
- mypy mat2 libmat2/*.py --ignore-missing-imports
- mypy --ignore-missing-imports mat2 libmat2/*.py ./nautilus/mat2.py
tests:debian:
stage: test
script:
- apt-get -qqy update
- apt-get -qqy install --no-install-recommends python3-mutagen python3-gi-cairo gir1.2-poppler-0.18 gir1.2-gdkpixbuf-2.0 libimage-exiftool-perl python3-coverage
- apt-get -qqy install --no-install-recommends python3-mutagen python3-gi-cairo gir1.2-poppler-0.18 gir1.2-gdkpixbuf-2.0 libimage-exiftool-perl python3-coverage ffmpeg
- apt-get -qqy purge bubblewrap
- python3-coverage run --branch -m unittest discover -s tests/
- python3-coverage report -m --include 'libmat2/*'
- python3-coverage report --fail-under=90 -m --include 'libmat2/*'
tests:debian_with_bubblewrap:
stage: test
tags:
- whitewhale
script:
- apt-get -qqy update
- apt-get -qqy install --no-install-recommends python3-mutagen python3-gi-cairo gir1.2-poppler-0.18 gir1.2-gdkpixbuf-2.0 libimage-exiftool-perl python3-coverage ffmpeg bubblewrap
- python3-coverage run --branch -m unittest discover -s tests/
- python3-coverage report --fail-under=100 -m --include 'libmat2/*'
tests:fedora:
image: fedora
stage: test
tags:
- whitewhale
script:
- dnf install -y python3 python3-mutagen python3-gobject
- dnf install -y gdk-pixbuf2 poppler-glib gdk-pixbuf2 gdk-pixbuf2-modules
- dnf install -y cairo-gobject cairo python3-cairo
- dnf install -y perl-Image-ExifTool mailcap
- dnf install -y python3 python3-mutagen python3-gobject gdk-pixbuf2 poppler-glib gdk-pixbuf2 gdk-pixbuf2-modules cairo-gobject cairo python3-cairo perl-Image-ExifTool mailcap
- gdk-pixbuf-query-loaders-64 > /usr/lib64/gdk-pixbuf-2.0/2.10.0/loaders.cache
- python3 setup.py test
tests:archlinux:
image: archlinux/base
stage: test
tags:
- whitewhale
script:
- pacman -Sy --noconfirm python-mutagen python-gobject gdk-pixbuf2 poppler-glib gdk-pixbuf2 python-cairo perl-image-exiftool python-setuptools mailcap ffmpeg
- python3 setup.py test

5
.mailmap Normal file
View File

@@ -0,0 +1,5 @@
Julien (jvoisin) Voisin <julien.voisin+mat2@dustri.org> totallylegit <totallylegit@dustri.org>
Julien (jvoisin) Voisin <julien.voisin+mat2@dustri.org> jvoisin <julien.voisin@dustri.org>
Julien (jvoisin) Voisin <julien.voisin+mat2@dustri.org> jvoisin <jvoisin@riseup.net>
Daniel Kahn Gillmor <dkg@fifthhorseman.net> dkg <dkg@fifthhorseman.net>

17
.pylintrc Normal file
View File

@@ -0,0 +1,17 @@
[FORMAT]
good-names=e,f,i,x,s
max-locals=20
[MESSAGES CONTROL]
disable=
fixme,
invalid-name,
duplicate-code,
missing-docstring,
protected-access,
abstract-method,
wrong-import-position,
catching-non-exception,
cell-var-from-loop,
locally-disabled,
invalid-sequence-index, # pylint doesn't like things like `Tuple[int, bytes]` in type annotation

View File

@@ -1,3 +1,83 @@
# 0.7.0 - 2019-02-17
- Add support for wmv files
- Add support for gif files
- Add support for html files
- Sandbox external processes via bubblewrap
- Simplify archive-based formats processing
- The Nautilus extension now plays nicer with other extensions
# 0.6.0 - 2018-11-10
- Add lightweight cleaning for jpeg
- Add support for zip files
- Add support for mp4 files
- Improve metadata extraction for archives
- Improve robustness against corrupted embedded files
- Fix a possible security issue on some terminals (control character
injection via --show)
- Various internal cleanup/improvements
# 0.5.0 - 2018-10-23
- Video (.avi files for now) support, via FFmpeg, optionally
- Lightweight cleaning for png and tiff files
- Processing files starting with a dash is now quicker
- Metadata are now displayed sorted
- Recursive metadata support for FLAC files
- Unsupported extensions aren't displayed in `./mat2 -l` anymore
- Improve the display when no metadata are found
- Update the logo according to the GNOME guidelines
- The testsuite is now runnable on the installed version of mat2
- Various internal cleanup/improvements
# 0.4.0 - 2018-10-03
- There is now a policy, for advanced users, to deal with unknown embedded fileformats
- Improve the documentation
- Various minor refactoring
- Improve how corrupted PNG are handled
- Dangerous/advanced cli's options no longer have short versions
- Significant improvements to office files anonymisation
- Archive members are sorted lexicographically
- XML attributes are sorted lexicographically too
- RSID are now stripped
- Dangling references in [Content_types].xml are now removed
- Significant improvements to office files support
- Anonimysed office files can now be opened by MS Office without warnings
- The CLI isn't threaded anymore, for it was causing issues
- Various misc typo fix
# 0.3.1 - 2018-09-01
- Document how to install MAT2 for various distributions
- Fix various typos in the documentation/comments
- Add ArchLinux to the CI to ensure that MAT2 is running on it
- Fix the handling of files with a name ending in `.JPG`
- Improve the detection of unsupported extensions in upper-case
- Streamline MAT2's logging
# 0.3.0 - 2018-08-03
- Add a check for missing dependencies
- Add Nautilus extension
- Minors code simplifications
- Improve our linters' coverage
- Add a manpage
- Add folder/multiple files related tests
- Change the logo
# 0.2.0 - 2018-07-10
- Fix various crashes dues to malformed files
- Simplify various code-paths
- Remove superfluous debug message
- Remove the `--check` option that never was implemented anyway
- Add a `-c` option to check for MAT2's dependencies
# 0.1.3 - 2018-07-06
- Improve MAT2 resilience against corrupted images

View File

@@ -24,6 +24,14 @@ Since MAT2 is written in Python3, please conform as much as possible to the
1. Update the [changelog](https://0xacab.org/jvoisin/mat2/blob/master/CHANGELOG.md)
2. Update the version in the [mat2](https://0xacab.org/jvoisin/mat2/blob/master/mat2) file
3. Update the version in the [setup.py](https://0xacab.org/jvoisin/mat2/blob/master/setup.py) file
4. Commit the changelog, mat2 and setup.py files
5. Create a tag with `git tag -s $VERSION`
6. Push the tag with `git push --tags`
4. Update the version and date in the [man page](https://0xacab.org/jvoisin/mat2/blob/master/doc/mat2.1)
5. Commit the changelog, man page, mat2 and setup.py files
6. Create a tag with `git tag -s $VERSION`
7. Push the commit with `git push origin master`
8. Push the tag with `git push --tags`
9. Create the signed tarball with `git archive --format=tar.xz --prefix=mat-$VERSION/ $VERSION > mat-$VERSION.tar.xz`
10. Sign the tarball with `gpg --armor --detach-sign mat-$VERSION.tar.xz`
11. Upload the result on Gitlab's [tag page](https://0xacab.org/jvoisin/mat2/tags) and add the changelog there
12. Announce the release on the [mailing list](https://mailman.boum.org/listinfo/mat-dev)
13. Upload the new version on pypi with `python3 setup.py sdist bdist_wheel` then `twine upload -s dist/*`
14. Do the secret release dance

76
INSTALL.md Normal file
View File

@@ -0,0 +1,76 @@
# Python ecosystem
If you feel like running arbitrary code downloaded over the
internet (pypi doesn't support gpg signatures [anymore](https://github.com/pypa/python-packaging-user-guide/pull/466)),
mat2 is [available on pypi](https://pypi.org/project/mat2/), and can be
installed like this:
```
pip3 install mat2
```
# GNU/Linux
## Optional dependencies
When [bubblewrap](https://github.com/projectatomic/bubblewrap) is
installed, MAT2 uses it to sandbox any external processes it invokes.
## Fedora
Thanks to [atenart](https://ack.tf/), there is a package available on
[Fedora's copr]( https://copr.fedorainfracloud.org/coprs/atenart/mat2/ ).
We use copr (cool other packages repo) as the Mat2 Nautilus plugin depends on
python3-nautilus, which isn't available yet in Fedora (but is distributed
through this copr).
First you need to enable Mat2's copr:
```
dnf -y copr enable atenart/mat2
```
Then you can install both the Mat2 command and Nautilus extension:
```
dnf -y install mat2 mat2-nautilus
```
## Debian
There a package available in Debian *buster/sid*. The package [doesn't include
the Nautilus extension yet](https://bugs.debian.org/910491).
For Debian 9 *stretch*, there is a way to install it *manually*:
```
# apt install python3-mutagen python3-gi-cairo gir1.2-gdkpixbuf-2.0 libimage-exiftool-perl gir1.2-glib-2.0 gir1.2-poppler-0.18 ffmpeg
# apt install bubblewrap # if you want sandboxing
$ git clone https://0xacab.org/jvoisin/mat2.git
$ cd mat2
$ ./mat2
```
and if you want to install the über-fancy Nautilus extension:
```
# apt install gnome-common gtk-doc-tools libnautilus-extension-dev python-gi-dev python3-dev build-essential
$ git clone https://github.com/GNOME/nautilus-python
$ cd nautilus-python
$ PYTHON=/usr/bin/python3 ./autogen.sh
$ make
# make install
$ mkdir -p ~/.local/share/nautilus-python/extensions/
$ cp ../nautilus/mat2.py ~/.local/share/nautilus-python/extensions/
$ PYTHONPATH=/home/$USER/mat2 PYTHON=/usr/bin/python3 nautilus
```
## Arch Linux
Thanks to [Francois_B](https://www.sciunto.org/), there is an package available on
[Arch linux's AUR](https://aur.archlinux.org/packages/mat2/).
## Gentoo
MAT2 is available in the [torbrowser overlay](https://github.com/MeisterP/torbrowser-overlay).

View File

@@ -1,6 +1,6 @@
```
_____ _____ _____ ___
| | _ |_ _|_ | Keep you data,
| | _ |_ _|_ | Keep your data,
| | | | | | | | _| trash your meta!
|_|_|_|__|__| |_| |___|
@@ -30,10 +30,11 @@ metadata.
- `python3-mutagen` for audio support
- `python3-gi-cairo` and `gir1.2-poppler-0.18` for PDF support
- `gir1.2-gdkpixbuf-2.0` for images support
- `FFmpeg`, optionally, for video support
- `libimage-exiftool-perl` for everything else
Please note that MAT2 requires at least Python3.5, meaning that it
doesn't run on [Debian Jessie](https://packages.debian.org/jessie/python3),
doesn't run on [Debian Jessie](https://packages.debian.org/jessie/python3).
# Running the test suite
@@ -41,40 +42,79 @@ doesn't run on [Debian Jessie](https://packages.debian.org/jessie/python3),
$ python3 -m unittest discover -v
```
And if you want to see the coverage:
```bash
$ python3-coverage run --branch -m unittest discover -s tests/
$ python3-coverage report --include -m --include /libmat2/*'
```
# How to use MAT2
```bash
usage: mat2 [-h] [-v] [-l] [-c | -s | -L] [files [files ...]]
usage: mat2 [-h] [-v] [-l] [--check-dependencies] [-V]
[--unknown-members policy] [-s | -L]
[files [files ...]]
Metadata anonymisation toolkit 2
positional arguments:
files
files the files to process
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-l, --list list all supported fileformats
-c, --check check if a file is free of harmful metadatas
-s, --show list all the harmful metadata of a file without removing
them
-L, --lightweight remove SOME metadata
-h, --help show this help message and exit
-v, --version show program's version number and exit
-l, --list list all supported fileformats
--check-dependencies check if MAT2 has all the dependencies it needs
-V, --verbose show more verbose status information
--unknown-members policy
how to handle unknown members of archive-style files
(policy should be one of: abort, omit, keep)
-s, --show list harmful metadata detectable by MAT2 without
removing them
-L, --lightweight remove SOME metadata
```
Note that MAT2 **will not** clean files in-place, but will produce, for
example, with a file named "myfile.png" a cleaned version named
"myfile.cleaned.png".
# Notes about detecting metadata
While MAT2 is doing its very best to display metadata when the `--show` flag is
passed, it doesn't mean that a file is clean from any metadata if MAT2 doesn't
show any. There is no reliable way to detect every single possible metadata for
complex file formats.
This is why you shouldn't rely on metadata's presence to decide if your file must
be cleaned or not.
# Notes about the lightweight mode
By default, mat2 might alter a bit the data of your files, in order to remove
as much metadata as possible. For example, texts in PDF might not be selectable anymore,
compressed images might get compressed again, …
Since some users might be willing to trade some metadata's presence in exchange
of the guarantee that mat2 won't modify the data of their files, there is the
`-L` flag that precisely does that.
# Related software
- The first iteration of [MAT](http://mat.boum.org)
- The first iteration of [MAT](https://mat.boum.org)
- [Exiftool](https://sno.phy.queensu.ca/~phil/exiftool/mat)
- [pdf-redact-tools](https://github.com/firstlookmedia/pdf-redact-tools), that
tries to deal with *printer dots* too.
- [pdfparanoia](https://github.com/kanzure/pdfparanoia), that removes
watermarks from PDF.
- [Scrambled Exif](https://f-droid.org/packages/com.jarsilio.android.scrambledeggsif/),
an open-source Android application to remove metadata from pictures.
# Contact
If possible, use the [issues system](https://0xacab.org/jvoisin/mat2/issues).
If you think that a more private contact is needed (eg. for reporting security issues),
you can email Julien (jvoisin) Voisin at `julien.voisin+mat@dustri.org`,
If possible, use the [issues system](https://0xacab.org/jvoisin/mat2/issues)
or the [mailing list](https://mailman.boum.org/listinfo/mat-dev)
Should a more private contact be needed (eg. for reporting security issues),
you can email Julien (jvoisin) Voisin at `julien.voisin+mat2@dustri.org`,
using the gpg key `9FCDEE9E1A381F311EA62A7404D041E8171901CC`.
# License
@@ -93,6 +133,7 @@ You should have received a copy of the GNU Lesser General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Copyright 2018 Julien (jvoisin) Voisin <julien.voisin+mat2@dustri.org>
Copyright 2016 Marie Rose for MAT2's logo
# Thanks

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.1 KiB

After

Width:  |  Height:  |  Size: 28 KiB

View File

@@ -1,27 +1,630 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg xmlns="http://www.w3.org/2000/svg" version="1.0">
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
inkscape:export-ydpi="384"
inkscape:export-xdpi="384"
inkscape:export-filename="mat2.png"
width="128"
height="128"
id="svg11300"
sodipodi:version="0.32"
inkscape:version="0.92.2 2405546, 2018-03-11"
sodipodi:docname="mat2.svg"
inkscape:output_extension="org.inkscape.output.svg.inkscape"
version="1.0"
style="display:inline;enable-background:new"
viewBox="0 0 128 128">
<title
id="title4162">Adwaita Icon Template</title>
<defs
id="defs3" />
<sodipodi:namedview
stroke="#ef2929"
fill="#f57900"
id="base"
pagecolor="#ffffff"
bordercolor="#666666"
borderopacity="0.25490196"
inkscape:pageopacity="0.0"
inkscape:pageshadow="2"
inkscape:zoom="4.0446508"
inkscape:cx="99.116732"
inkscape:cy="42.537095"
inkscape:current-layer="layer1"
showgrid="true"
inkscape:grid-bbox="true"
inkscape:document-units="px"
inkscape:showpageshadow="false"
inkscape:window-width="1920"
inkscape:window-height="1021"
inkscape:window-x="0"
inkscape:window-y="22"
width="400px"
height="300px"
inkscape:snap-nodes="true"
inkscape:snap-bbox="false"
objecttolerance="7"
gridtolerance="12"
guidetolerance="13"
inkscape:window-maximized="1"
inkscape:pagecheckerboard="false"
showguides="true"
inkscape:guide-bbox="true"
inkscape:locked="false"
inkscape:measure-start="0,0"
inkscape:measure-end="0,0"
inkscape:object-nodes="true"
inkscape:bbox-nodes="true"
inkscape:snap-global="true"
inkscape:object-paths="true"
inkscape:snap-intersection-paths="true"
inkscape:snap-bbox-edge-midpoints="true"
inkscape:snap-bbox-midpoints="true"
showborder="false"
inkscape:snap-center="true"
inkscape:snap-object-midpoints="true"
inkscape:snap-midpoints="true"
inkscape:snap-smooth-nodes="true">
<inkscape:grid
type="xygrid"
id="grid5883"
spacingx="2"
spacingy="2"
enabled="true"
visible="true"
empspacing="4"
originx="0"
originy="0" />
<sodipodi:guide
position="64,8"
orientation="0,1"
id="guide1073"
inkscape:locked="false"
inkscape:label=""
inkscape:color="rgb(0,0,255)" />
<sodipodi:guide
position="12,64"
orientation="1,0"
id="guide1075"
inkscape:locked="false"
inkscape:label=""
inkscape:color="rgb(0,0,255)" />
<sodipodi:guide
position="64,104"
orientation="0,1"
id="guide1099"
inkscape:locked="false"
inkscape:label=""
inkscape:color="rgb(0,0,255)" />
<sodipodi:guide
position="64,128"
orientation="0,1"
id="guide993"
inkscape:locked="false"
inkscape:label=""
inkscape:color="rgb(0,0,255)" />
<sodipodi:guide
position="104,64"
orientation="1,0"
id="guide995"
inkscape:locked="false"
inkscape:label=""
inkscape:color="rgb(0,0,255)" />
<sodipodi:guide
position="9.2651362e-08,64"
orientation="1,0"
id="guide867"
inkscape:locked="false"
inkscape:label=""
inkscape:color="rgb(0,0,255)" />
<sodipodi:guide
position="120,64"
orientation="1,0"
id="guide869"
inkscape:locked="false"
inkscape:label=""
inkscape:color="rgb(0,0,255)" />
<sodipodi:guide
position="64,116"
orientation="0,1"
id="guide871"
inkscape:locked="false"
inkscape:label=""
inkscape:color="rgb(0,0,255)" />
<inkscape:grid
type="xygrid"
id="grid873"
spacingx="1"
spacingy="1"
empspacing="8"
color="#000000"
opacity="0.49019608"
empcolor="#000000"
empopacity="0.08627451"
dotted="true" />
<sodipodi:guide
position="24,64"
orientation="1,0"
id="guide877"
inkscape:locked="false"
inkscape:label=""
inkscape:color="rgb(0,0,255)" />
<sodipodi:guide
position="116,64"
orientation="1,0"
id="guide879"
inkscape:locked="false"
inkscape:label=""
inkscape:color="rgb(0,0,255)" />
<sodipodi:guide
position="64,120"
orientation="0,1"
id="guide881"
inkscape:locked="false"
inkscape:label=""
inkscape:color="rgb(0,0,255)" />
<sodipodi:guide
position="64,12"
orientation="0,1"
id="guide883"
inkscape:locked="false"
inkscape:label=""
inkscape:color="rgb(0,0,255)" />
<sodipodi:guide
position="8,64"
orientation="1,0"
id="guide885"
inkscape:locked="false"
inkscape:label=""
inkscape:color="rgb(0,0,255)" />
<sodipodi:guide
position="128,64"
orientation="1,0"
id="guide887"
inkscape:locked="false"
inkscape:label=""
inkscape:color="rgb(0,0,255)" />
<sodipodi:guide
position="64,0"
orientation="0,1"
id="guide897"
inkscape:locked="false"
inkscape:label=""
inkscape:color="rgb(0,0,255)" />
<sodipodi:guide
position="64,24"
orientation="0,1"
id="guide899"
inkscape:locked="false"
inkscape:label=""
inkscape:color="rgb(0,0,255)" />
<sodipodi:guide
position="256,256"
orientation="-0.70710678,0.70710678"
id="guide950"
inkscape:locked="false"
inkscape:label=""
inkscape:color="rgb(0,0,255)" />
<sodipodi:guide
position="64,64"
orientation="0.70710678,0.70710678"
id="guide952"
inkscape:locked="false"
inkscape:label=""
inkscape:color="rgb(0,0,255)" />
</sodipodi:namedview>
<metadata
id="metadata4">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
<dc:creator>
<cc:Agent>
<dc:title>GNOME Design Team</dc:title>
</cc:Agent>
</dc:creator>
<dc:source />
<cc:license
rdf:resource="http://creativecommons.org/licenses/by-sa/4.0/" />
<dc:title>Adwaita Icon Template</dc:title>
<dc:subject>
<rdf:Bag />
</dc:subject>
<dc:date />
<dc:rights>
<cc:Agent>
<dc:title />
</cc:Agent>
</dc:rights>
<dc:publisher>
<cc:Agent>
<dc:title />
</cc:Agent>
</dc:publisher>
<dc:identifier />
<dc:relation />
<dc:language />
<dc:coverage />
<dc:description />
<dc:contributor>
<cc:Agent>
<dc:title />
</cc:Agent>
</dc:contributor>
</cc:Work>
<cc:License
rdf:about="http://creativecommons.org/licenses/by-sa/4.0/">
<cc:permits
rdf:resource="http://creativecommons.org/ns#Reproduction" />
<cc:permits
rdf:resource="http://creativecommons.org/ns#Distribution" />
<cc:requires
rdf:resource="http://creativecommons.org/ns#Notice" />
<cc:requires
rdf:resource="http://creativecommons.org/ns#Attribution" />
<cc:permits
rdf:resource="http://creativecommons.org/ns#DerivativeWorks" />
<cc:requires
rdf:resource="http://creativecommons.org/ns#ShareAlike" />
</cc:License>
</rdf:RDF>
</metadata>
<g
fill="#27628a"
stroke="none">
<path
d="M0 5120 l0 -5120 3000 0 3000 0 0 5120 0 5120 -3000 0 -3000 0 0 -5120z" />
</g>
<g
fill="#7fcae7"
stroke="none">
<path
d="M 0,5120 V 0 h 3000 3000 v 5120 5120 H 3000 0 Z m 3041,3965 c 257.1951,-231.2173 270.8768,-244.4494 1132,-978 100.0843,-559.7796 173.9788,-986.5359 279,-1586 -165.7863,-405.0485 -178.8353,-430.8722 -292,-721 650.6072,-1421.1218 667.3936,-1452.2872 1190,-2550 -2109.4504,-0.035 -2130.9695,-0.025 -4468.86586,0.037 72.33788,69.7996 74.76441,71.6861 148.86586,140.963 -129.0483,91.5488 -134.68166,93.6858 -367,225 175.86245,383.2532 323.97381,668.4741 527,1073 35.6121,292.0899 72.3384,584.0406 109,876 5.074,391.6586 9.0034,783.3294 13,1175 314.3202,597.9247 654.4179,1182.5892 964,1783 88.7542,312.5107 121.9361,512.8332 194,862 95.2778,168.6736 102.3771,181.1881 273,473 113.1881,-286.567 245.9452,-613.0146 298,-773 z" />
</g>
<g
fill="#c0dede"
stroke="none">
<path
d="M0 1625 l0 -1625 3000 0 3000 0 0 1625 0 1625 -3000 0 -3000 0 0 -1625z" />
</g>
<g
fill="#ffffff"
stroke="none">
<path
d="M 881.01695,3249.9206 C 1286.0459,3091.4742 1546.5278,3035.4925 1889,2924 c 129.95,-482.4131 173.4726,-686.2614 331,-1262 132.796,95.3371 216.2935,142.9991 359,242 116.2556,-360.389 199.5642,-636.2515 320,-1025 108.0281,-100.84978 136.3812,-131.67871 296,-299 10,0 254,309 487,616 83.6789,470.193 92.832,516.3155 215,1032 422.9371,260.0129 459.4089,278.2641 878,528 0,69.3333 0,138.6667 0,208 253.7343,134.9322 263.2776,139.2776 570,286 H 3107 c -2226,0 -2219.49894,-0.1145 -2225.98305,-0.079 z" />
id="layer1"
inkscape:label="Icon"
inkscape:groupmode="layer"
style="display:inline"
transform="translate(0,-172)">
<g
inkscape:groupmode="layer"
id="layer2"
inkscape:label="baseplate"
style="display:none">
<text
xml:space="preserve"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:5.33333349px;line-height:125%;font-family:Cantarell;-inkscape-font-specification:'Cantarell, Normal';text-align:start;writing-mode:lr-tb;text-anchor:start;display:inline;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.33264872;enable-background:new"
x="7.9499588"
y="148.65199"
id="context"
inkscape:label="context"><tspan
sodipodi:role="line"
id="tspan2716"
x="7.9499588"
y="148.65199"
style="font-size:5.33333349px;stroke-width:0.33264872">apps</tspan></text>
<text
inkscape:label="icon-name"
id="text3021"
y="157.23398"
x="7.7533054"
style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:5.33333349px;line-height:125%;font-family:Cantarell;-inkscape-font-specification:'Cantarell, Bold';text-align:start;writing-mode:lr-tb;text-anchor:start;display:inline;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.33264872;enable-background:new"
xml:space="preserve"><tspan
y="157.23398"
x="7.7533054"
id="tspan3023"
sodipodi:role="line"
style="font-size:5.33333349px;stroke-width:0.33264872">org.gnome.</tspan></text>
<g
style="display:inline;fill:#000000;enable-background:new"
transform="matrix(7.9911709,0,0,8.0036407,-167.7909,-4846.0776)"
id="g12027"
inkscape:export-xdpi="12"
inkscape:export-ydpi="12" />
<rect
style="display:inline;overflow:visible;visibility:visible;fill:#f0f0f0;fill-opacity:1;fill-rule:nonzero;stroke:none;stroke-width:0.5;marker:none;enable-background:accumulate"
id="rect13805"
width="128"
height="128"
x="9.2651362e-08"
y="172"
inkscape:label="512x512" />
<g
id="g883"
style="fill:none;fill-opacity:0.25098039;stroke:#a579b3;stroke-opacity:1"
transform="translate(-24,24)" />
<g
id="g900"
style="fill:none;fill-opacity:0.25098039;stroke:#a579b3;stroke-opacity:1"
transform="translate(-24,24)" />
<g
id="g1168"
transform="matrix(0.25,0,0,0.25,6.9488522e-8,225)">
<circle
cx="256"
cy="44"
r="240"
id="path1142"
style="opacity:0.1;fill:#2864b0;fill-opacity:1;stroke:none;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;marker:none;marker-start:none;marker-mid:none;marker-end:none;paint-order:normal" />
<rect
ry="32"
rx="32"
y="-180"
x="96"
height="448"
width="319.99979"
id="rect1110"
style="opacity:0.1;fill:#2864b0;fill-opacity:1;stroke:none;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;marker:none;marker-start:none;marker-mid:none;marker-end:none;paint-order:normal" />
<rect
ry="32"
rx="32"
y="-164"
x="48"
height="416"
width="416"
id="rect1110-8"
style="display:inline;opacity:0.1;fill:#2864b0;fill-opacity:1;stroke:none;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;marker:none;marker-start:none;marker-mid:none;marker-end:none;paint-order:normal;enable-background:new" />
<rect
ry="32"
rx="32"
y="-116"
x="32"
height="320"
width="448"
id="rect1110-8-9"
style="display:inline;opacity:0.1;fill:#2864b0;fill-opacity:1;stroke:none;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;marker:none;marker-start:none;marker-mid:none;marker-end:none;paint-order:normal;enable-background:new" />
</g>
</g>
<g
inkscape:groupmode="layer"
id="layer9"
inkscape:label="hires"
style="display:none" />
<g
id="g944"
transform="matrix(1,0,0,0.93868822,0,14.545966)">
<path
style="fill:#99c1f1;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.41013032;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
d="m 50.53899,195.25817 6.396029,-11.43484 1.082405,-0.87215 4.821622,-10.46578 0.885604,-0.38763 2.558412,4.74837 2.755213,9.59364 1.672808,1.35667 3.542417,-0.87215 5.707227,12.59771 12.988859,9.59364 3.050415,3.87621 v 2.71335 l -16.334476,-1.25977 -7.084833,1.45359 -4.428021,-0.38763 -7.084833,0.29072 -11.414452,-0.58143 -3.640817,0.96905 -9.052843,-1.64739 -2.066409,0.0969 -1.476008,-0.48452 1.377607,-1.45358 1.869609,-1.06596 6.002428,-11.04722 1.279206,0.48453 5.412025,-6.49267 z"
id="path3455"
inkscape:connector-curvature="0" />
<path
style="fill:#241f31;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 49.358184,215.31759 -3.444016,0.9206 -9.003641,-1.74429 -1.918809,0.24226 -1.623608,-0.58143 1.574407,-1.50204 1.722008,-0.96905 5.953228,-11.09567 1.279205,0.53298 5.510426,-6.54112 0.344401,0.29072 -4.969223,10.27197 2.214011,1.93811 -0.246001,4.45765 z"
id="path3459"
inkscape:connector-curvature="0" />
<path
style="fill:#241f31;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 50.438601,195.22912 6.470906,-11.5803 1.113274,-0.6167 4.870575,-10.62099 0.904535,-0.41113 -0.417479,3.3576 0.626218,0.89079 0.834954,15.89722 1.391594,3.70021 -3.687722,5.34476 0.208739,1.37044 -0.347898,5.68737 1.87865,3.28908 7.375442,2.19272 1.252433,2.19272 -0.487057,0.13704 -4.244358,-0.54818 -6.540486,0.41114 -2.435287,-2.19272 -0.626216,-4.24839 -2.087389,-6.16703 -4.035619,-3.42612 -2.087388,-4.38544"
id="path3461"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 32.069579,218.11563 c -0.06958,-0.27409 0.695796,-1.23341 0.695796,-1.23341 l 2.783185,-0.0685 1.739491,2.26124 4.661836,5.13919 0.139158,1.57602 -4.174778,5.96145 -0.487057,6.16703 -2.922344,2.26124 -0.06958,1.57601 h -1.113274 l -1.322013,-3.08351 2.017809,-14.86938 z"
id="path3400"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 48.83827,222.43255 1.600331,-3.01499 -0.695796,-0.75375 -5.635951,-1.16488 -3.200663,0.82227 -0.06958,1.50749 1.53075,0.75375 1.461174,2.67237 -0.208739,1.71307 1.739489,1.02783 2.296129,-0.54818 z"
id="path3402"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 51.203977,217.70449 1.113274,-0.68522 2.365707,1.02784 1.322013,2.67237 -2.226548,2.26125 -1.322013,-0.82227 -1.322013,-0.61671 0.834956,-1.71306 z"
id="path3404"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 43.758957,226.61242 1.948228,0.68522 0.417479,1.91863 -0.626216,1.30193 -1.182854,0.34261 -1.113275,1.02784 -0.765376,3.63169 0.626218,3.01499 -1.252435,0.68522 -0.487057,-0.41113 -0.278319,-1.5075 -1.80907,-1.37045 -0.765376,-3.49464 3.618141,-3.42613 1.669912,-2.67237"
id="path3406"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 50.57776,223.25481 0.13916,0.68523 -2.783187,3.83726 0.06958,1.64454 -0.626218,1.50749 -1.60033,1.43897 -0.06958,0.75375 1.600333,1.91863 1.182854,3.08351 0.974114,0.68523 1.669911,-2.80942 -0.278318,-3.22056 3.966039,-3.3576 0.695796,-1.09636 -3.270243,-4.45396 z"
id="path3408"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 51.839954,236.39856 -0.834826,1.58948 0.166966,1.26061 1.057445,1.97315 0.500896,-0.32886 0.389584,-1.7539 1.447031,-1.151 2.337512,-4.0559 -0.22262,-1.04138 -1.947927,-1.69909 -2.114892,1.31542 0.278276,3.39819 z"
id="path3410"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 57.593778,229.84236 -1.043694,1.09636 0.765375,0.89079 1.043695,-0.20556 v -1.43898 z"
id="path3412"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 59.344793,218.25267 -0.765376,2.19272 -0.695796,0.27409 -0.695796,1.91863 -2.226548,2.26124 2.574446,3.56317 h 1.182854 l 0.487057,0.75375 0.626217,1.09636 1.948229,1.30193 2.922346,-0.6167 1.53075,-2.26125 -1.043694,-3.3576 -1.043693,-1.64454 1.322011,-2.60385 -0.904535,-1.37045 -2.226548,0.0685 z"
id="path3416"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 72.150522,238.17554 -0.518261,1.78635 1.036524,2.16915 1.684349,-2.04155 -0.647826,-2.16915 z"
id="path3418"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 66.789813,223.66595 1.600333,-0.75375 1.739489,-4.11135 2.922346,0.75375 1.322013,0.41114 0.139159,6.7152 -1.461172,1.02784 -2.226548,4.17987 -0.834956,-0.41114 -0.626216,0.95932 -2.574448,-0.61671 0.904537,-3.08351 z"
id="path3422"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 77.505077,218.59529 1.182854,-0.20557 2.435287,1.30193 -0.974115,1.02783 -2.087389,3.63169 -1.391593,0.0685 -1.113274,-0.61671 1.043695,-2.19271 z"
id="path3426"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 73.539038,231.06638 1.043695,-1.30193 1.043694,-2.80942 4.522676,1.71306 -0.974115,2.87795 -1.94823,-0.41114 -1.80907,1.09636 z"
id="path3428"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 78.200873,225.6531 7.932079,-7.94861 3.339822,1.09636 0.974115,0.13705 1.600331,-1.02784 3.339822,0.0685 -5.079314,12.81371 -3.200663,-1.98715 0.139161,-1.16489 -0.695798,-0.6167 -0.208737,-1.16488 -1.043696,0.27409 -3.200663,2.39829 z"
id="path3430"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 81.401536,230.99786 c 0,-0.2741 2.156968,-1.98716 2.156968,-1.98716 l 2.017811,1.30193 -0.904535,2.32976 -1.182855,0.75375 z"
id="path3432"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 81.679855,237.8501 0.765375,-1.91863 0.208739,-1.2334 2.156969,0.20557 2.156968,-2.87795 3.409403,1.02784 -0.904535,2.80942 -0.904535,0.34261 -0.626218,2.80943 1.043694,4.72805 -0.904535,1.09636 -1.80907,-2.19272 -0.626217,-1.37045 z"
id="path3434"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 78.131294,238.60385 0.626216,3.08351 -0.626216,3.22056 0.765375,0.95931 -0.626216,5.68737 2.504866,2.32976 1.87865,-0.47965 0.417478,-3.35761 1.669911,-0.0685 3.757301,-1.8501 -0.20874,-1.98716 -2.226548,-0.20556 -1.182854,-3.01499 -3.200662,-2.05568 -1.252434,-2.39828 z"
id="path3436"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 84.532619,251.41755 -0.278318,1.43898 -0.695797,0.6167 1.322013,2.67238 2.365709,-0.20557 1.53075,-2.94647 -2.365707,-1.98715 z"
id="path3438"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 64.993183,249.51154 -1.14959,2.51583 0.766392,1.69818 2.618509,0.25159 0.702526,1.19502 1.021857,2.39003 -0.574794,2.32714 3.89583,1.88688 0.95799,-1.06923 0.510928,-4.59139 -4.023561,-2.70451 -0.127732,-4.21402 z"
id="path3440"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 72.912822,251.00642 h 1.391592 l 2.574446,0.75375 1.391593,1.98715 1.461172,1.30193 -0.139159,3.42612 -3.409402,1.57602 -0.974115,-1.85011 0.626217,-3.3576 -3.270243,-1.85011 z"
id="path3442"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 72.147446,264.77944 1.80907,-1.98715 3.339822,-1.85011 1.322013,-0.0685 4.661835,-3.63169 1.391594,0.34261 0.556637,4.52248 -3.200664,4.04283 -2.852765,-0.82227 -1.80907,0.54818 -0.765376,1.43897 -2.087389,0.68522 z"
id="path3444"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 75.06979,272.93361 0.765376,-1.30192 1.252433,-0.41114 0.904535,-2.87794 1.94823,-0.61671 0.556637,2.60386 -3.339822,6.0985 -1.391593,-0.0685 z"
id="path3446"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 71.451649,268.20556 -1.252433,1.85011 2.504867,1.98715 0.765376,0.82227 1.73949,-2.39829 -2.296127,-2.80942 -1.461173,0.27409 z"
id="path3448"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 62.24531,254.0948 1.461172,1.02784 1.948229,0.54818 0.487058,1.64454 -1.461173,2.67237 -0.06958,1.78159 -1.669911,1.85011 -1.252433,-2.05568 0.487057,-2.80942 -1.391593,-0.34261 -0.904535,-2.80942 z"
id="path3450"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 47.585836,246.55246 -0.695796,3.70021 -0.139159,1.37045 1.87865,0.68523 1.391592,0.95931 1.809071,-1.64454 -0.417478,-0.95931 z"
id="path3452"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 54.682958,247.78586 -1.043694,1.02784 0.208739,1.98715 1.600331,0.89079 0.626217,-0.47965 0.06958,-2.26125 z"
id="path3454"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 48.629531,258.95503 4.800994,-6.16703 3.409402,0.82227 0.556637,1.78159 3.131083,4.79657 -1.669911,5.82441 -3.200663,-1.37045 -0.417478,-3.49464 -2.087388,1.30192 z"
id="path3456"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 45.915924,252.71948 -0.487056,1.98715 1.60033,1.57602 1.461174,-0.20557 -0.347899,-2.19272 z"
id="path3458"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 67.555189,261.6274 -1.80907,2.80943 -2.435287,8.42826 2.783185,3.76874 1.461172,-0.0685 1.113274,-2.12419 1.043696,-0.20557 0.487057,-1.09636 -1.043694,-4.45396 1.182853,-4.31692 z"
id="path3460"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 58.718577,267.79443 1.600331,-1.23341 2.017809,1.71306 -0.904535,1.85011 z"
id="path3462"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 58.509838,276.49678 2.156968,-4.591 1.391593,-0.27409 0.834955,1.50749 -2.017809,5.13919 z"
id="path3464"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 71.242911,274.02997 1.391592,0.20557 1.043694,3.01499 2.01781,0.68522 1.530751,1.57602 -0.904535,2.87795 -2.365707,2.32976 -0.139159,3.56317 -1.322013,1.98715 -2.504867,-1.85011 -0.278318,-2.67237 -1.530752,-1.78159 -1.113274,-3.08351 3.61814,-4.17987 z"
id="path3466"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 62.893354,276.5653 3.270244,1.16489 0.06958,3.70021 -0.556637,0.68523 0.974115,3.70021 1.252433,1.64454 0.06958,3.08351 -2.017809,1.37045 -2.574447,8.08566 -2.574447,-1.30193 -1.948229,-9.79872 z"
id="path3468"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 58.440258,283.5546 h 0.556637 l 0.417478,0.95931 -0.208739,1.30193 -1.461172,0.13704 z"
id="path3472"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 56.700767,279.16916 -1.113274,0.95931 0.834956,2.80943 1.600331,0.20556 0.487058,-2.05567 -0.695796,-1.91863 z"
id="path3474"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 53.152207,272.17987 0.139159,5.13918 1.87865,1.23341 0.834955,-0.54818 0.904535,-3.63169 1.530752,-1.57602 -1.669911,-3.97431 -3.548561,3.08352 z"
id="path3476"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 45.915924,258.33832 -0.208739,3.83726 -4.731414,3.97431 1.948229,2.80942 8.488716,0.82227 0.417478,1.98715 1.043694,-0.75375 0.487057,-2.19272 1.182854,-1.64454 -0.417478,-1.09635 -1.87865,-2.60386 -3.757299,-1.37045 -1.461174,-3.22056 z"
id="path3480"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 40.279975,263.68308 1.669912,0.6167 3.061502,-6.37259 -0.904535,-5.61884 -2.504867,-0.34262 -1.391592,-1.2334 2.156968,-7.606 -2.087388,-4.45396 -3.409402,1.57602 -0.834956,3.42612 -1.87865,0.20557 -0.347898,2.1242 1.530752,1.64454 h 1.322013 l 0.626217,3.90578 2.296127,5.61884 -0.347898,2.19272 z"
id="path3482"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 66.531337,247.61066 -0.590018,-0.31657 -0.420783,-1.71262 0.427793,-0.66945 1.306823,-1.13114 2.316342,-1.38746 1.06612,0.23465 -0.01701,2.21105 -2.36166,3.35302 z"
id="path4284"
inkscape:connector-curvature="0"
inkscape:transform-center-x="4.9927099"
inkscape:transform-center-y="-9.3161687" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 72.373733,232.22199 -0.815102,1.03206 4.017286,4.12827 1.571981,0.17201 1.339096,-0.86006 0.931544,0.63071 2.387083,-2.98152 -2.794634,-0.91739 -3.027519,0.22934 z"
id="path3601"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 57.407878,237.1102 -1.301737,2.34289 -1.301738,0.61888 -0.17955,1.45878 -4.488748,1.54719 -0.403989,1.50299 0.314213,0.30944 1.032412,0.0884 v 1.41457 l 1.660839,1.50299 2.154598,-1.94504 1.571064,0.35364 2.738136,-1.94504 -1.436399,-2.56392 0.987525,-3.44803 -0.583538,-1.37037 z"
id="path3603"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 62.104217,246.96003 5.843936,-6.55723 0.659867,-2.66044 2.221783,-0.40757 -0.386451,-3.39556 -2.000988,-0.60704 -6.246127,-0.36572 -2.624948,2.5137 1.519708,2.75102 -0.347742,5.51876 z"
id="path3605"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 71.024647,249.63275 5.822153,1.31875 1.047988,-3.89891 -1.280874,-1.43343 0.523995,-6.02038 -3.551515,5.275 0.34933,2.06413 -2.037753,0.80272 -1.164431,0.45869 z"
id="path3607"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 59.099222,247.24427 -2.095974,1.72011 -0.05822,1.60543 0.465772,1.72011 1.455539,0.97473 -0.407551,0.97473 2.328861,-0.34402 2.27064,-2.86685 -1.571981,-0.57337 -0.640437,-2.86685 -1.51376,-0.40136 z"
id="path3609"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 44.078067,234.34346 0.291107,4.47228 -1.863089,1.43342 2.095976,3.72691 2.037753,0.0573 2.27064,-3.55489 -2.969297,-4.98831 z"
id="path3611"
inkscape:connector-curvature="0" />
<path
style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 44.660282,245.46683 -3.318627,4.30027 1.339096,1.26141 2.561747,-0.28668 1.222652,-3.15354 z"
id="path3613"
inkscape:connector-curvature="0" />
</g>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 1.7 KiB

After

Width:  |  Height:  |  Size: 34 KiB

View File

@@ -6,7 +6,7 @@ Lightweight cleaning mode
Due to *popular* request, MAT2 is providing a *lightweight* cleaning mode,
that only cleans the superficial metadata of your file, but not
the ones that might be in **embeded** resources. Like for example,
the ones that might be in **embedded** resources. Like for example,
images in a PDF or an office document.
Revisions handling
@@ -61,3 +61,11 @@ Images handling
When possible, images are handled like PDF: rendered on a surface, then saved
to the filesystem. This ensures that every metadata is removed.
XML attacks
-----------
Since our threat model conveniently excludes files crafted to specifically
bypass MAT2, fileformats containing harmful XML are out of our scope.
But since MAT2 is using [etree](https://docs.python.org/3/library/xml.html#xml-vulnerabilities)
to process XML, it's "only" vulnerable to DoS, and not memory corruption:
odds are that the user will notice that the cleaning didn't succeed.

76
doc/mat2.1 Normal file
View File

@@ -0,0 +1,76 @@
.TH MAT2 "1" "February 2019" "MAT2 0.7.0" "User Commands"
.SH NAME
mat2 \- the metadata anonymisation toolkit 2
.SH SYNOPSIS
\fBmat2\fR [\-h] [\-v] [\-l] [\-V] [-s | -L] [\fIfiles\fR [\fIfiles ...\fR]]
.SH DESCRIPTION
.B mat2
removes metadata from various fileformats. It supports a wide variety of file
formats, audio, office, images, …
Careful, mat2 does not clean files in-place, instead, it will produce a file with the word
"cleaned" between the filename and its extension, for example "filename.cleaned.png"
for a file named "filename.png".
.SH OPTIONS
.SS "positional arguments:"
.TP
\fBfiles\fR
the files to process
.SS "optional arguments:"
.TP
\fB\-h\fR, \fB\-\-help\fR
show this help message and exit
.TP
\fB\-v\fR, \fB\-\-version\fR
show program's version number and exit
.TP
\fB\-l\fR, \fB\-\-list\fR
list all supported fileformats
.TP
\fB\-\-check\-dependencies\fR
check if MAT2 has all the dependencies it needs
.TP
\fB\-V\fR, \fB\-\-verbose\fR
show more verbose status information
.TP
\fB\-\-unknown-members\fR \fIpolicy\fR
how to handle unknown members of archive-style files (policy should be one of: abort, omit, keep)
.TP
\fB\-s\fR, \fB\-\-show\fR
list harmful metadata detectable by MAT2 without
removing them
.TP
\fB\-L\fR, \fB\-\-lightweight\fR
remove SOME metadata
.SH EXAMPLES
To remove all the metadata from a PDF file:
.PP
.nf
.RS
mat2 ./myfile.pdf
.RE
.fi
.PP
.SH BUGS
While mat2 does its very best to remove every single metadata,
it's still in beta, and \fBsome\fR might remain. Should you encounter
some issues, check the bugtracker: https://0xacab.org/jvoisin/mat2/issues
.PP
Please use accordingly and be careful.
.SH AUTHOR
This software was made by Julien (jvoisin) Voisin with the support of the Tails project.
.SH COPYRIGHT
This software is released on LGPLv3.
.SH "SEE ALSO"
.BR exiftool (1p)
.BR pdf-redact-tools (1)

View File

@@ -1,7 +1,18 @@
#!/bin/env python3
#!/usr/bin/env python3
import collections
import enum
import importlib
from typing import Dict, Optional
from . import exiftool, video
# make pyflakes happy
assert Dict
assert Optional
# A set of extension that aren't supported, despite matching a supported mimetype
unsupported_extensions = {
UNSUPPORTED_EXTENSIONS = {
'.asc',
'.bat',
'.brf',
@@ -17,3 +28,35 @@ unsupported_extensions = {
'.xsd',
'.xsl',
}
DEPENDENCIES = {
'cairo': 'Cairo',
'gi': 'PyGobject',
'gi.repository.GdkPixbuf': 'GdkPixbuf from PyGobject',
'gi.repository.Poppler': 'Poppler from PyGobject',
'gi.repository.GLib': 'GLib from PyGobject',
'mutagen': 'Mutagen',
}
def check_dependencies() -> Dict[str, bool]:
ret = collections.defaultdict(bool) # type: Dict[str, bool]
ret['Exiftool'] = bool(exiftool._get_exiftool_path())
ret['Ffmpeg'] = bool(video._get_ffmpeg_path())
for key, value in DEPENDENCIES.items():
ret[value] = True
try:
importlib.import_module(key)
except ImportError: # pragma: no cover
ret[value] = False # pragma: no cover
return ret
@enum.unique
class UnknownMemberPolicy(enum.Enum):
ABORT = 'abort'
OMIT = 'omit'
KEEP = 'keep'

View File

@@ -1,27 +1,41 @@
import abc
import os
from typing import Set, Dict
import re
from typing import Set, Dict, Union
assert Set # make pyflakes happy
class AbstractParser(abc.ABC):
""" This is the base class of every parser.
It might yield `ValueError` on instantiation on invalid files,
and `RuntimeError` when something went wrong in `remove_all`.
"""
meta_list = set() # type: Set[str]
mimetypes = set() # type: Set[str]
def __init__(self, filename: str) -> None:
"""
:raises ValueError: Raised upon an invalid file
"""
if re.search('^[a-z0-9./]', filename) is None:
# Some parsers are calling external binaries,
# this prevents shell command injections
filename = os.path.join('.', filename)
self.filename = filename
fname, extension = os.path.splitext(filename)
self.output_filename = fname + '.cleaned' + extension
self.lightweight_cleaning = False
@abc.abstractmethod
def get_meta(self) -> Dict[str, str]:
def get_meta(self) -> Dict[str, Union[str, dict]]:
pass # pragma: no cover
@abc.abstractmethod
def remove_all(self) -> bool:
"""
:raises RuntimeError: Raised if the cleaning process went wrong.
"""
# pylint: disable=unnecessary-pass
pass # pragma: no cover
def remove_all_lightweight(self) -> bool:
""" Remove _SOME_ metadata. """
return self.remove_all()

178
libmat2/archive.py Normal file
View File

@@ -0,0 +1,178 @@
import zipfile
import datetime
import tempfile
import os
import logging
import shutil
from typing import Dict, Set, Pattern, Union, Any
from . import abstract, UnknownMemberPolicy, parser_factory
# Make pyflakes happy
assert Set
assert Pattern
assert Union
class ArchiveBasedAbstractParser(abstract.AbstractParser):
""" Office files (.docx, .odt, …) are zipped files. """
def __init__(self, filename):
super().__init__(filename)
# Those are the files that have a format that _isn't_
# supported by MAT2, but that we want to keep anyway.
self.files_to_keep = set() # type: Set[Pattern]
# Those are the files that we _do not_ want to keep,
# no matter if they are supported or not.
self.files_to_omit = set() # type: Set[Pattern]
# what should the parser do if it encounters an unknown file in
# the archive?
self.unknown_member_policy = UnknownMemberPolicy.ABORT # type: UnknownMemberPolicy
try: # better fail here than later
zipfile.ZipFile(self.filename)
except zipfile.BadZipFile:
raise ValueError
def _specific_cleanup(self, full_path: str) -> bool:
""" This method can be used to apply specific treatment
to files present in the archive."""
# pylint: disable=unused-argument,no-self-use
return True # pragma: no cover
def _specific_get_meta(self, full_path: str, file_path: str) -> Dict[str, Any]:
""" This method can be used to extract specific metadata
from files present in the archive."""
# pylint: disable=unused-argument,no-self-use
return {} # pragma: no cover
@staticmethod
def _clean_zipinfo(zipinfo: zipfile.ZipInfo) -> zipfile.ZipInfo:
zipinfo.create_system = 3 # Linux
zipinfo.comment = b''
zipinfo.date_time = (1980, 1, 1, 0, 0, 0) # this is as early as a zipfile can be
return zipinfo
@staticmethod
def _get_zipinfo_meta(zipinfo: zipfile.ZipInfo) -> Dict[str, str]:
metadata = {}
if zipinfo.create_system == 3: # this is Linux
pass
elif zipinfo.create_system == 2:
metadata['create_system'] = 'Windows'
else:
metadata['create_system'] = 'Weird'
if zipinfo.comment:
metadata['comment'] = zipinfo.comment # type: ignore
if zipinfo.date_time != (1980, 1, 1, 0, 0, 0):
metadata['date_time'] = str(datetime.datetime(*zipinfo.date_time))
return metadata
def get_meta(self) -> Dict[str, Union[str, dict]]:
meta = dict() # type: Dict[str, Union[str, dict]]
with zipfile.ZipFile(self.filename) as zin:
temp_folder = tempfile.mkdtemp()
for item in zin.infolist():
local_meta = dict() # type: Dict[str, Union[str, Dict]]
for k, v in self._get_zipinfo_meta(item).items():
local_meta[k] = v
if item.filename[-1] == '/': # pragma: no cover
# `is_dir` is added in Python3.6
continue # don't keep empty folders
zin.extract(member=item, path=temp_folder)
full_path = os.path.join(temp_folder, item.filename)
specific_meta = self._specific_get_meta(full_path, item.filename)
for (k, v) in specific_meta.items():
local_meta[k] = v
tmp_parser, _ = parser_factory.get_parser(full_path) # type: ignore
if tmp_parser:
for k, v in tmp_parser.get_meta().items():
local_meta[k] = v
if local_meta:
meta[item.filename] = local_meta
shutil.rmtree(temp_folder)
return meta
def remove_all(self) -> bool:
# pylint: disable=too-many-branches
with zipfile.ZipFile(self.filename) as zin,\
zipfile.ZipFile(self.output_filename, 'w') as zout:
temp_folder = tempfile.mkdtemp()
abort = False
# Since files order is a fingerprint factor,
# we're iterating (and thus inserting) them in lexicographic order.
for item in sorted(zin.infolist(), key=lambda z: z.filename):
if item.filename[-1] == '/': # `is_dir` is added in Python3.6
continue # don't keep empty folders
zin.extract(member=item, path=temp_folder)
full_path = os.path.join(temp_folder, item.filename)
if self._specific_cleanup(full_path) is False:
logging.warning("Something went wrong during deep cleaning of %s",
item.filename)
abort = True
continue
if any(map(lambda r: r.search(item.filename), self.files_to_keep)):
# those files aren't supported, but we want to add them anyway
pass
elif any(map(lambda r: r.search(item.filename), self.files_to_omit)):
continue
else: # supported files that we want to first clean, then add
tmp_parser, mtype = parser_factory.get_parser(full_path) # type: ignore
if not tmp_parser:
if self.unknown_member_policy == UnknownMemberPolicy.OMIT:
logging.warning("In file %s, omitting unknown element %s (format: %s)",
self.filename, item.filename, mtype)
continue
elif self.unknown_member_policy == UnknownMemberPolicy.KEEP:
logging.warning("In file %s, keeping unknown element %s (format: %s)",
self.filename, item.filename, mtype)
else:
logging.error("In file %s, element %s's format (%s) " \
"isn't supported",
self.filename, item.filename, mtype)
abort = True
continue
if tmp_parser:
if tmp_parser.remove_all() is False:
logging.warning("In file %s, something went wrong \
with the cleaning of %s \
(format: %s)",
self.filename, item.filename, mtype)
abort = True
continue
os.rename(tmp_parser.output_filename, full_path)
zinfo = zipfile.ZipInfo(item.filename) # type: ignore
clean_zinfo = self._clean_zipinfo(zinfo)
with open(full_path, 'rb') as f:
zout.writestr(clean_zinfo, f.read())
shutil.rmtree(temp_folder)
if abort:
os.remove(self.output_filename)
return False
return True
class ZipParser(ArchiveBasedAbstractParser):
mimetypes = {'application/zip'}

View File

@@ -1,8 +1,12 @@
import mimetypes
import os
import shutil
import tempfile
from typing import Dict, Union
import mutagen
from . import abstract
from . import abstract, parser_factory
class MutagenParser(abstract.AbstractParser):
@@ -13,13 +17,13 @@ class MutagenParser(abstract.AbstractParser):
except mutagen.MutagenError:
raise ValueError
def get_meta(self):
def get_meta(self) -> Dict[str, Union[str, dict]]:
f = mutagen.File(self.filename)
if f.tags:
return {k:', '.join(v) for k, v in f.tags.items()}
return {}
def remove_all(self):
def remove_all(self) -> bool:
shutil.copy(self.filename, self.output_filename)
f = mutagen.File(self.output_filename)
f.delete()
@@ -30,8 +34,8 @@ class MutagenParser(abstract.AbstractParser):
class MP3Parser(MutagenParser):
mimetypes = {'audio/mpeg', }
def get_meta(self):
metadata = {}
def get_meta(self) -> Dict[str, Union[str, dict]]:
metadata = {} # type: Dict[str, Union[str, dict]]
meta = mutagen.File(self.filename).tags
for key in meta:
metadata[key.rstrip(' \t\r\n\0')] = ', '.join(map(str, meta[key].text))
@@ -44,3 +48,30 @@ class OGGParser(MutagenParser):
class FLACParser(MutagenParser):
mimetypes = {'audio/flac', 'audio/x-flac'}
def remove_all(self) -> bool:
shutil.copy(self.filename, self.output_filename)
f = mutagen.File(self.output_filename)
f.clear_pictures()
f.delete()
f.save(deleteid3=True)
return True
def get_meta(self) -> Dict[str, Union[str, dict]]:
meta = super().get_meta()
for num, picture in enumerate(mutagen.File(self.filename).pictures):
name = picture.desc if picture.desc else 'Cover %d' % num
extension = mimetypes.guess_extension(picture.mime)
if extension is None: # pragma: no cover
meta[name] = 'harmful data'
continue
_, fname = tempfile.mkstemp()
fname = fname + extension
with open(fname, 'wb') as f:
f.write(picture.data)
p, _ = parser_factory.get_parser(fname) # type: ignore
# Mypy chokes on ternaries :/
meta[name] = p.get_meta() if p else 'harmful data' # type: ignore
os.remove(fname)
return meta

70
libmat2/exiftool.py Normal file
View File

@@ -0,0 +1,70 @@
import json
import logging
import os
from typing import Dict, Union, Set
from . import abstract
from . import subprocess
# Make pyflakes happy
assert Set
class ExiftoolParser(abstract.AbstractParser):
""" Exiftool is often the easiest way to get all the metadata
from a import file, hence why several parsers are re-using its `get_meta`
method.
"""
meta_whitelist = set() # type: Set[str]
def get_meta(self) -> Dict[str, Union[str, dict]]:
out = subprocess.run([_get_exiftool_path(), '-json', self.filename],
input_filename=self.filename,
check=True, stdout=subprocess.PIPE).stdout
meta = json.loads(out.decode('utf-8'))[0]
for key in self.meta_whitelist:
meta.pop(key, None)
return meta
def _lightweight_cleanup(self) -> bool:
if os.path.exists(self.output_filename):
try:
# exiftool can't force output to existing files
os.remove(self.output_filename)
except OSError as e: # pragma: no cover
logging.error("The output file %s is already existing and \
can't be overwritten: %s.", self.filename, e)
return False
# Note: '-All=' must be followed by a known exiftool option.
# Also, '-CommonIFD0' is needed for .tiff files
cmd = [_get_exiftool_path(),
'-all=', # remove metadata
'-adobe=', # remove adobe-specific metadata
'-exif:all=', # remove all exif metadata
'-Time:All=', # remove all timestamps
'-quiet', # don't show useless logs
'-CommonIFD0=', # remove IFD0 metadata
'-o', self.output_filename,
self.filename]
try:
subprocess.run(cmd, check=True,
input_filename=self.filename,
output_filename=self.output_filename)
except subprocess.CalledProcessError as e: # pragma: no cover
logging.error("Something went wrong during the processing of %s: %s", self.filename, e)
return False
return True
def _get_exiftool_path() -> str: # pragma: no cover
possible_pathes = {
'/usr/bin/exiftool', # debian/fedora
'/usr/bin/vendor_perl/exiftool', # archlinux
}
for possible_path in possible_pathes:
if os.path.isfile(possible_path):
if os.access(possible_path, os.X_OK):
return possible_path
raise RuntimeError("Unable to find exiftool")

View File

@@ -1,13 +1,13 @@
import shutil
from typing import Dict
from typing import Dict, Union
from . import abstract
class HarmlessParser(abstract.AbstractParser):
""" This is the parser for filetypes that do not contain metadata. """
""" This is the parser for filetypes that can not contain metadata. """
mimetypes = {'text/plain', 'image/x-ms-bmp'}
def get_meta(self) -> Dict[str, str]:
def get_meta(self) -> Dict[str, Union[str, dict]]:
return dict()
def remove_all(self) -> bool:

69
libmat2/html.py Normal file
View File

@@ -0,0 +1,69 @@
from html import parser
from typing import Dict, Any, List, Tuple
from . import abstract
class HTMLParser(abstract.AbstractParser):
mimetypes = {'text/html', }
def __init__(self, filename):
super().__init__(filename)
self.__parser = _HTMLParser()
with open(filename) as f:
self.__parser.feed(f.read())
self.__parser.close()
def get_meta(self) -> Dict[str, Any]:
return self.__parser.get_meta()
def remove_all(self) -> bool:
return self.__parser.remove_all(self.output_filename)
class _HTMLParser(parser.HTMLParser):
"""Python doesn't have a validating html parser in its stdlib, so
we're using an internal queue to track all the opening/closing tags,
and hoping for the best.
"""
def __init__(self):
super().__init__()
self.__textrepr = ''
self.__meta = {}
self.__validation_queue = []
def handle_starttag(self, tag: str, attrs: List[Tuple[str, str]]):
self.__textrepr += self.get_starttag_text()
self.__validation_queue.append(tag)
def handle_endtag(self, tag: str):
if not self.__validation_queue:
raise ValueError
elif tag != self.__validation_queue.pop():
raise ValueError
# There is no `get_endtag_text()` method :/
self.__textrepr += '</' + tag + '>\n'
def handle_data(self, data: str):
if data.strip():
self.__textrepr += data
def handle_startendtag(self, tag: str, attrs: List[Tuple[str, str]]):
if tag == 'meta':
meta = {k:v for k, v in attrs}
name = meta.get('name', 'harmful metadata')
content = meta.get('content', 'harmful data')
self.__meta[name] = content
else:
self.__textrepr += self.get_starttag_text()
def remove_all(self, output_filename: str) -> bool:
if self.__validation_queue:
raise ValueError
with open(output_filename, 'w') as f:
f.write(self.__textrepr)
return True
def get_meta(self) -> Dict[str, Any]:
if self.__validation_queue:
raise ValueError
return self.__meta

View File

@@ -1,48 +1,19 @@
import subprocess
import imghdr
import json
import os
import shutil
import tempfile
import re
from typing import Set
import cairo
import gi
gi.require_version('GdkPixbuf', '2.0')
from gi.repository import GdkPixbuf
from gi.repository import GdkPixbuf, GLib
from . import abstract
from . import exiftool
# Make pyflakes happy
assert Set
class _ImageParser(abstract.AbstractParser):
@staticmethod
def __handle_problematic_filename(filename: str, callback) -> str:
""" This method takes a filename with a problematic name,
and safely applies it a `callback`."""
tmpdirname = tempfile.mkdtemp()
fname = os.path.join(tmpdirname, "temp_file")
shutil.copy(filename, fname)
out = callback(fname)
shutil.rmtree(tmpdirname)
return out
def get_meta(self):
""" There is no way to escape the leading(s) dash(es) of the current
self.filename to prevent parameter injections, so we need to take care
of this.
"""
fun = lambda f: subprocess.check_output(['/usr/bin/exiftool', '-json', f])
if re.search('^[a-z0-9/]', self.filename) is None:
out = self.__handle_problematic_filename(self.filename, fun)
else:
out = fun(self.filename)
meta = json.loads(out.decode('utf-8'))[0]
for key in self.meta_whitelist:
meta.pop(key, None)
return meta
class PNGParser(_ImageParser):
class PNGParser(exiftool.ExiftoolParser):
mimetypes = {'image/png', }
meta_whitelist = {'SourceFile', 'ExifToolVersion', 'FileName',
'Directory', 'FileSize', 'FileModifyDate',
@@ -54,36 +25,63 @@ class PNGParser(_ImageParser):
def __init__(self, filename):
super().__init__(filename)
try: # better fail here than later
cairo.ImageSurface.create_from_png(self.filename)
except MemoryError:
if imghdr.what(filename) != 'png':
raise ValueError
def remove_all(self):
try: # better fail here than later
cairo.ImageSurface.create_from_png(self.filename)
except MemoryError: # pragma: no cover
raise ValueError
def remove_all(self) -> bool:
if self.lightweight_cleaning:
return self._lightweight_cleanup()
surface = cairo.ImageSurface.create_from_png(self.filename)
surface.write_to_png(self.output_filename)
return True
class GdkPixbufAbstractParser(_ImageParser):
class GIFParser(exiftool.ExiftoolParser):
mimetypes = {'image/gif'}
meta_whitelist = {'AnimationIterations', 'BackgroundColor', 'BitsPerPixel',
'ColorResolutionDepth', 'Directory', 'Duration',
'ExifToolVersion', 'FileAccessDate',
'FileInodeChangeDate', 'FileModifyDate', 'FileName',
'FilePermissions', 'FileSize', 'FileType',
'FileTypeExtension', 'FrameCount', 'GIFVersion',
'HasColorMap', 'ImageHeight', 'ImageSize', 'ImageWidth',
'MIMEType', 'Megapixels', 'SourceFile',}
def remove_all(self) -> bool:
return self._lightweight_cleanup()
class GdkPixbufAbstractParser(exiftool.ExiftoolParser):
""" GdkPixbuf can handle a lot of surfaces, so we're rending images on it,
this has the side-effect of removing metadata completely.
this has the side-effect of completely removing metadata.
"""
_type = ''
def remove_all(self):
_, extension = os.path.splitext(self.filename)
pixbuf = GdkPixbuf.Pixbuf.new_from_file(self.filename)
if extension == '.jpg':
extension = '.jpeg' # gdk is picky
pixbuf.savev(self.output_filename, extension[1:], [], [])
return True
def __init__(self, filename):
super().__init__(filename)
if imghdr.what(filename) != self._type: # better safe than sorry
# we can't use imghdr here because of https://bugs.python.org/issue28591
try:
GdkPixbuf.Pixbuf.new_from_file(self.filename)
except GLib.GError:
raise ValueError
def remove_all(self) -> bool:
if self.lightweight_cleaning:
return self._lightweight_cleanup()
_, extension = os.path.splitext(self.filename)
pixbuf = GdkPixbuf.Pixbuf.new_from_file(self.filename)
if extension.lower() == '.jpg':
extension = '.jpeg' # gdk is picky
pixbuf.savev(self.output_filename, type=extension[1:], option_keys=[], option_values=[])
return True
class JPGParser(GdkPixbufAbstractParser):
_type = 'jpeg'

View File

@@ -1,124 +1,46 @@
import logging
import os
import re
import shutil
import tempfile
import datetime
import zipfile
import xml.etree.ElementTree as ET
from typing import Dict, Set, Pattern
from typing import Dict, Set, Pattern, Tuple, Any
import xml.etree.ElementTree as ET # type: ignore
from . import abstract, parser_factory
from .archive import ArchiveBasedAbstractParser
# pylint: disable=line-too-long
# Make pyflakes happy
assert Set
assert Pattern
def _parse_xml(full_path: str):
""" This function parse XML with namespace support. """
def parse_map(f): # etree support for ns is a bit rough
ns_map = dict()
for event, (k, v) in ET.iterparse(f, ("start-ns", )):
if event == "start-ns":
ns_map[k] = v
return ns_map
def _parse_xml(full_path: str) -> Tuple[ET.ElementTree, Dict[str, str]]:
""" This function parses XML, with namespace support. """
namespace_map = dict()
for _, (key, value) in ET.iterparse(full_path, ("start-ns", )):
# The ns[0-9]+ namespaces are reserved for internal usage, so
# we have to use an other nomenclature.
if re.match('^ns[0-9]+$', key, re.I): # pragma: no cover
key = 'mat' + key[2:]
ns = parse_map(full_path)
namespace_map[key] = value
ET.register_namespace(key, value)
# Register the namespaces
for k, v in ns.items():
ET.register_namespace(k, v)
return ET.parse(full_path), ns
return ET.parse(full_path), namespace_map
class ArchiveBasedAbstractParser(abstract.AbstractParser):
# Those are the files that have a format that _isn't_
# supported by MAT2, but that we want to keep anyway.
files_to_keep = set() # type: Set[str]
def _sort_xml_attributes(full_path: str) -> bool:
""" Sort xml attributes lexicographically,
because it's possible to fingerprint producers (MS Office, Libreoffice, …)
since they are all using different orders.
"""
tree = ET.parse(full_path)
# Those are the files that we _do not_ want to keep,
# no matter if they are supported or not.
files_to_omit = set() # type: Set[Pattern]
for c in tree.getroot():
c[:] = sorted(c, key=lambda child: (child.tag, child.get('desc')))
def __init__(self, filename):
super().__init__(filename)
try: # better fail here than later
zipfile.ZipFile(self.filename)
except zipfile.BadZipFile:
raise ValueError
def _specific_cleanup(self, full_path: str) -> bool:
""" This method can be used to apply specific treatment
to files present in the archive."""
return True
def _clean_zipinfo(self, zipinfo: zipfile.ZipInfo) -> zipfile.ZipInfo:
zipinfo.create_system = 3 # Linux
zipinfo.comment = b''
zipinfo.date_time = (1980, 1, 1, 0, 0, 0)
return zipinfo
def _get_zipinfo_meta(self, zipinfo: zipfile.ZipInfo) -> Dict[str, str]:
metadata = {}
if zipinfo.create_system == 3:
#metadata['create_system'] = 'Linux'
pass
elif zipinfo.create_system == 2:
metadata['create_system'] = 'Windows'
else:
metadata['create_system'] = 'Weird'
if zipinfo.comment:
metadata['comment'] = zipinfo.comment # type: ignore
if zipinfo.date_time != (1980, 1, 1, 0, 0, 0):
metadata['date_time'] = str(datetime.datetime(*zipinfo.date_time))
return metadata
def remove_all(self) -> bool:
with zipfile.ZipFile(self.filename) as zin,\
zipfile.ZipFile(self.output_filename, 'w') as zout:
temp_folder = tempfile.mkdtemp()
for item in zin.infolist():
if item.filename[-1] == '/': # `is_dir` is added in Python3.6
continue # don't keep empty folders
zin.extract(member=item, path=temp_folder)
full_path = os.path.join(temp_folder, item.filename)
if self._specific_cleanup(full_path) is False:
shutil.rmtree(temp_folder)
os.remove(self.output_filename)
print("Something went wrong during deep cleaning of %s" % item.filename)
return False
if item.filename in self.files_to_keep:
# those files aren't supported, but we want to add them anyway
pass
elif any(map(lambda r: r.search(item.filename), self.files_to_omit)):
continue
else:
# supported files that we want to clean then add
tmp_parser, mtype = parser_factory.get_parser(full_path) # type: ignore
if not tmp_parser:
shutil.rmtree(temp_folder)
os.remove(self.output_filename)
print("%s's format (%s) isn't supported" % (item.filename, mtype))
return False
tmp_parser.remove_all()
os.rename(tmp_parser.output_filename, full_path)
zinfo = zipfile.ZipInfo(item.filename) # type: ignore
clean_zinfo = self._clean_zipinfo(zinfo)
with open(full_path, 'rb') as f:
zout.writestr(clean_zinfo, f.read())
shutil.rmtree(temp_folder)
return True
tree.write(full_path, xml_declaration=True)
return True
class MSOfficeParser(ArchiveBasedAbstractParser):
@@ -127,80 +49,267 @@ class MSOfficeParser(ArchiveBasedAbstractParser):
'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
'application/vnd.openxmlformats-officedocument.presentationml.presentation'
}
files_to_keep = {
'[Content_Types].xml',
'_rels/.rels',
'word/_rels/document.xml.rels',
'word/document.xml',
'word/fontTable.xml',
'word/settings.xml',
'word/styles.xml',
content_types_to_keep = {
'application/vnd.openxmlformats-officedocument.wordprocessingml.endnotes+xml', # /word/endnotes.xml
'application/vnd.openxmlformats-officedocument.wordprocessingml.footnotes+xml', # /word/footnotes.xml
'application/vnd.openxmlformats-officedocument.extended-properties+xml', # /docProps/app.xml
'application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml', # /word/document.xml
'application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml', # /word/fontTable.xml
'application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml', # /word/footer.xml
'application/vnd.openxmlformats-officedocument.wordprocessingml.header+xml', # /word/header.xml
'application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml', # /word/styles.xml
'application/vnd.openxmlformats-package.core-properties+xml', # /docProps/core.xml
# Do we want to keep the following ones?
'application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml',
# See https://0xacab.org/jvoisin/mat2/issues/71
'application/vnd.openxmlformats-officedocument.wordprocessingml.numbering+xml', # /word/numbering.xml
}
files_to_omit = set(map(re.compile, { # type: ignore
'^docProps/',
}))
def __remove_revisions(self, full_path: str) -> bool:
""" In this function, we're changing the XML
document in two times, since we don't want
to change the tree we're iterating on."""
tree, ns = _parse_xml(full_path)
# No revisions are present
if tree.find('.//w:del', ns) is None:
return True
elif tree.find('.//w:ins', ns) is None:
def __init__(self, filename):
super().__init__(filename)
self.files_to_keep = set(map(re.compile, { # type: ignore
r'^\[Content_Types\]\.xml$',
r'^_rels/\.rels$',
r'^word/_rels/document\.xml\.rels$',
r'^word/_rels/footer[0-9]*\.xml\.rels$',
r'^word/_rels/header[0-9]*\.xml\.rels$',
# https://msdn.microsoft.com/en-us/library/dd908153(v=office.12).aspx
r'^word/stylesWithEffects\.xml$',
}))
self.files_to_omit = set(map(re.compile, { # type: ignore
r'^customXml/',
r'webSettings\.xml$',
r'^docProps/custom\.xml$',
r'^word/printerSettings/',
r'^word/theme',
r'^word/people\.xml$',
# we have a whitelist in self.files_to_keep,
# so we can trash everything else
r'^word/_rels/',
}))
if self.__fill_files_to_keep_via_content_types() is False:
raise ValueError
def __fill_files_to_keep_via_content_types(self) -> bool:
""" There is a suer-handy `[Content_Types].xml` file
in MS Office archives, describing what each other file contains.
The self.content_types_to_keep member contains a type whitelist,
so we're using it to fill the self.files_to_keep one.
"""
with zipfile.ZipFile(self.filename) as zin:
if '[Content_Types].xml' not in zin.namelist():
return False
xml_data = zin.read('[Content_Types].xml')
self.content_types = dict() # type: Dict[str, str]
try:
tree = ET.fromstring(xml_data)
except ET.ParseError:
return False
for c in tree:
if 'PartName' not in c.attrib or 'ContentType' not in c.attrib:
continue
elif c.attrib['ContentType'] in self.content_types_to_keep:
fname = c.attrib['PartName'][1:] # remove leading `/`
re_fname = re.compile('^' + re.escape(fname) + '$')
self.files_to_keep.add(re_fname) # type: ignore
return True
@staticmethod
def __remove_rsid(full_path: str) -> bool:
""" The method will remove "revision session ID". We're '}rsid'
instead of proper parsing, since rsid can have multiple forms, like
`rsidRDefault`, `rsidR`, `rsids`, …
We're removing rsid tags in two times, because we can't modify
the xml while we're iterating on it.
For more details, see
- https://msdn.microsoft.com/en-us/library/office/documentformat.openxml.wordprocessing.previoussectionproperties.rsidrpr.aspx
- https://blogs.msdn.microsoft.com/brian_jones/2006/12/11/whats-up-with-all-those-rsids/
"""
try:
tree, namespace = _parse_xml(full_path)
except ET.ParseError:
return False
# rsid, tags or attributes, are always under the `w` namespace
if 'w' not in namespace.keys():
return True
parent_map = {c:p for p in tree.iter() for c in p}
elements = list([element for element in tree.iterfind('.//w:del', ns)])
for element in elements:
elements_to_remove = list()
for item in tree.iterfind('.//', namespace):
if '}rsid' in item.tag.strip().lower(): # rsid as tag
elements_to_remove.append(item)
continue
for key in list(item.attrib.keys()): # rsid as attribute
if '}rsid' in key.lower():
del item.attrib[key]
for element in elements_to_remove:
parent_map[element].remove(element)
elements = list()
for element in tree.iterfind('.//w:ins', ns):
for position, item in enumerate(tree.iter()):
tree.write(full_path, xml_declaration=True)
return True
@staticmethod
def __remove_revisions(full_path: str) -> bool:
""" In this function, we're changing the XML document in several
different times, since we don't want to change the tree we're currently
iterating on.
"""
try:
tree, namespace = _parse_xml(full_path)
except ET.ParseError as e:
logging.error("Unable to parse %s: %s", full_path, e)
return False
# Revisions are either deletions (`w:del`) or
# insertions (`w:ins`)
del_presence = tree.find('.//w:del', namespace)
ins_presence = tree.find('.//w:ins', namespace)
if del_presence is None and ins_presence is None:
return True # No revisions are present
parent_map = {c:p for p in tree.iter() for c in p}
elements_del = list()
for element in tree.iterfind('.//w:del', namespace):
elements_del.append(element)
for element in elements_del:
parent_map[element].remove(element)
elements_ins = list()
for element in tree.iterfind('.//w:ins', namespace):
for position, item in enumerate(tree.iter()): # pragma: no cover
if item == element:
for children in element.iterfind('./*'):
elements.append((element, position, children))
elements_ins.append((element, position, children))
break
for (element, position, children) in elements:
for (element, position, children) in elements_ins:
parent_map[element].insert(position, children)
parent_map[element].remove(element)
tree.write(full_path, xml_declaration=True)
return True
def __remove_content_type_members(self, full_path: str) -> bool:
""" The method will remove the dangling references
form the [Content_Types].xml file, since MS office doesn't like them
"""
try:
tree, namespace = _parse_xml(full_path)
except ET.ParseError: # pragma: no cover
return False
if len(namespace.items()) != 1:
return False # there should be only one namespace for Types
removed_fnames = set()
with zipfile.ZipFile(self.filename) as zin:
for fname in [item.filename for item in zin.infolist()]:
for file_to_omit in self.files_to_omit:
if file_to_omit.search(fname):
matches = map(lambda r: r.search(fname), self.files_to_keep)
if any(matches): # the file is whitelisted
continue
removed_fnames.add(fname)
break
root = tree.getroot()
for item in root.findall('{%s}Override' % namespace['']):
name = item.attrib['PartName'][1:] # remove the leading '/'
if name in removed_fnames:
root.remove(item)
tree.write(full_path, xml_declaration=True)
return True
def _specific_cleanup(self, full_path: str) -> bool:
if full_path.endswith('/word/document.xml'):
return self.__remove_revisions(full_path)
# pylint: disable=too-many-return-statements
if os.stat(full_path).st_size == 0: # Don't process empty files
return True
if not full_path.endswith('.xml'):
return True
if full_path.endswith('/[Content_Types].xml'):
# this file contains references to files that we might
# remove, and MS Office doesn't like dangling references
if self.__remove_content_type_members(full_path) is False:
return False
elif full_path.endswith('/word/document.xml'):
# this file contains the revisions
if self.__remove_revisions(full_path) is False:
return False
elif full_path.endswith('/docProps/app.xml'):
# This file must be present and valid,
# so we're removing as much as we can.
with open(full_path, 'wb') as f:
f.write(b'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>')
f.write(b'<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties">')
f.write(b'</Properties>')
elif full_path.endswith('/docProps/core.xml'):
# This file must be present and valid,
# so we're removing as much as we can.
with open(full_path, 'wb') as f:
f.write(b'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>')
f.write(b'<cp:coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties">')
f.write(b'</cp:coreProperties>')
if self.__remove_rsid(full_path) is False:
return False
try:
_sort_xml_attributes(full_path)
except ET.ParseError as e: # pragma: no cover
logging.error("Unable to parse %s: %s", full_path, e)
return False
# This is awful, I'm sorry.
#
# Microsoft Office isn't happy when we have the `mc:Ignorable`
# tag containing namespaces that aren't present in the xml file,
# so instead of trying to remove this specific tag with etree,
# we're removing it, with a regexp.
#
# Since we're the ones producing this file, via the call to
# _sort_xml_attributes, there won't be any "funny tricks".
# Worst case, the tag isn't present, and everything is fine.
#
# see: https://docs.microsoft.com/en-us/dotnet/framework/wpf/advanced/mc-ignorable-attribute
with open(full_path, 'rb') as f:
text = f.read()
out = re.sub(b'mc:Ignorable="[^"]*"', b'', text, 1)
with open(full_path, 'wb') as f:
f.write(out)
return True
def get_meta(self) -> Dict[str, str]:
def _specific_get_meta(self, full_path: str, file_path: str) -> Dict[str, Any]:
"""
Yes, I know that parsing xml with regexp ain't pretty,
be my guest and fix it if you want.
"""
metadata = {}
zipin = zipfile.ZipFile(self.filename)
for item in zipin.infolist():
if item.filename.startswith('docProps/') and item.filename.endswith('.xml'):
content = zipin.read(item).decode('utf-8')
try:
results = re.findall(r"<(.+)>(.+)</\1>", content, re.I|re.M)
for (key, value) in results:
metadata[key] = value
except TypeError: # We didn't manage to parse the xml file
pass
if not metadata: # better safe than sorry
metadata[item] = 'harmful content'
for key, value in self._get_zipinfo_meta(item).items():
metadata[key] = value
zipin.close()
return metadata
if not file_path.startswith('docProps/') or not file_path.endswith('.xml'):
return {}
with open(full_path, encoding='utf-8') as f:
try:
results = re.findall(r"<(.+)>(.+)</\1>", f.read(), re.I|re.M)
return {k:v for (k, v) in results}
except (TypeError, UnicodeDecodeError):
# We didn't manage to parse the xml file
return {file_path: 'harmful content', }
class LibreOfficeParser(ArchiveBasedAbstractParser):
@@ -213,59 +322,70 @@ class LibreOfficeParser(ArchiveBasedAbstractParser):
'application/vnd.oasis.opendocument.formula',
'application/vnd.oasis.opendocument.image',
}
files_to_keep = {
'META-INF/manifest.xml',
'content.xml',
'manifest.rdf',
'mimetype',
'settings.xml',
'styles.xml',
}
files_to_omit = set(map(re.compile, { # type: ignore
r'^meta\.xml$',
'^Configurations2/',
'^Thumbnails/',
}))
def __remove_revisions(self, full_path: str) -> bool:
tree, ns = _parse_xml(full_path)
def __init__(self, filename):
super().__init__(filename)
if 'office' not in ns.keys(): # no revisions in the current file
self.files_to_keep = set(map(re.compile, { # type: ignore
r'^META-INF/manifest\.xml$',
r'^content\.xml$',
r'^manifest\.rdf$',
r'^mimetype$',
r'^settings\.xml$',
r'^styles\.xml$',
}))
self.files_to_omit = set(map(re.compile, { # type: ignore
r'^meta\.xml$',
r'^Configurations2/',
r'^Thumbnails/',
}))
@staticmethod
def __remove_revisions(full_path: str) -> bool:
try:
tree, namespace = _parse_xml(full_path)
except ET.ParseError as e:
logging.error("Unable to parse %s: %s", full_path, e)
return False
if 'office' not in namespace.keys(): # no revisions in the current file
return True
for text in tree.getroot().iterfind('.//office:text', ns):
for changes in text.iterfind('.//text:tracked-changes', ns):
for text in tree.getroot().iterfind('.//office:text', namespace):
for changes in text.iterfind('.//text:tracked-changes', namespace):
text.remove(changes)
tree.write(full_path, xml_declaration=True)
return True
def _specific_cleanup(self, full_path: str) -> bool:
if os.path.basename(full_path) == 'content.xml':
return self.__remove_revisions(full_path)
if os.stat(full_path).st_size == 0: # Don't process empty files
return True
if os.path.basename(full_path).endswith('.xml'):
if os.path.basename(full_path) == 'content.xml':
if self.__remove_revisions(full_path) is False:
return False
try:
_sort_xml_attributes(full_path)
except ET.ParseError as e:
logging.error("Unable to parse %s: %s", full_path, e)
return False
return True
def get_meta(self) -> Dict[str, str]:
def _specific_get_meta(self, full_path: str, file_path: str) -> Dict[str, Any]:
"""
Yes, I know that parsing xml with regexp ain't pretty,
be my guest and fix it if you want.
"""
metadata = {}
zipin = zipfile.ZipFile(self.filename)
for item in zipin.infolist():
if item.filename == 'meta.xml':
content = zipin.read(item).decode('utf-8')
try:
results = re.findall(r"<((?:meta|dc|cp).+?)>(.+)</\1>", content, re.I|re.M)
for (key, value) in results:
metadata[key] = value
except TypeError: # We didn't manage to parse the xml file
pass
if not metadata: # better safe than sorry
metadata[item] = 'harmful content'
for key, value in self._get_zipinfo_meta(item).items():
metadata[key] = value
zipin.close()
return metadata
if file_path != 'meta.xml':
return {}
with open(full_path, encoding='utf-8') as f:
try:
results = re.findall(r"<((?:meta|dc|cp).+?)[^>]*>(.+)</\1>", f.read(), re.I|re.M)
return {k:v for (k, v) in results}
except (TypeError, UnicodeDecodeError): # We didn't manage to parse the xml file
# We didn't manage to parse the xml file
return {file_path: 'harmful content', }

View File

@@ -4,24 +4,31 @@ import mimetypes
import importlib
from typing import TypeVar, List, Tuple, Optional
from . import abstract, unsupported_extensions
from . import abstract, UNSUPPORTED_EXTENSIONS
assert Tuple # make pyflakes happy
T = TypeVar('T', bound='abstract.AbstractParser')
def __load_all_parsers():
""" Loads every parser in a dynamic way """
current_dir = os.path.dirname(__file__)
for name in glob.glob(os.path.join(current_dir, '*.py')):
if name.endswith('abstract.py') or name.endswith('__init__.py'):
for fname in glob.glob(os.path.join(current_dir, '*.py')):
if fname.endswith('abstract.py'):
continue
basename = os.path.basename(name)
elif fname.endswith('__init__.py'):
continue
elif fname.endswith('exiftool.py'):
continue
basename = os.path.basename(fname)
name, _ = os.path.splitext(basename)
importlib.import_module('.' + name, package='libmat2')
__load_all_parsers()
def _get_parsers() -> List[T]:
""" Get all our parsers!"""
def __get_parsers(cls):
@@ -31,10 +38,11 @@ def _get_parsers() -> List[T]:
def get_parser(filename: str) -> Tuple[Optional[T], Optional[str]]:
""" Return the appropriate parser for a giver filename. """
mtype, _ = mimetypes.guess_type(filename)
_, extension = os.path.splitext(filename)
if extension in unsupported_extensions:
if extension.lower() in UNSUPPORTED_EXTENSIONS:
return None, mtype
for parser_class in _get_parsers(): # type: ignore

View File

@@ -7,6 +7,7 @@ import re
import logging
import tempfile
import io
from typing import Dict, Union
from distutils.version import LooseVersion
import cairo
@@ -16,12 +17,10 @@ from gi.repository import Poppler, GLib
from . import abstract
logging.basicConfig(level=logging.DEBUG)
poppler_version = Poppler.get_version()
if LooseVersion(poppler_version) < LooseVersion('0.46'):
if LooseVersion(poppler_version) < LooseVersion('0.46'): # pragma: no cover
raise ValueError("MAT2 needs at least Poppler version 0.46 to work. \
The installed version is %s." % poppler_version)
The installed version is %s." % poppler_version) # pragma: no cover
class PDFParser(abstract.AbstractParser):
@@ -39,7 +38,12 @@ class PDFParser(abstract.AbstractParser):
except GLib.GError: # Invalid PDF
raise ValueError
def remove_all_lightweight(self):
def remove_all(self) -> bool:
if self.lightweight_cleaning is True:
return self.__remove_all_lightweight()
return self.__remove_all_thorough()
def __remove_all_lightweight(self) -> bool:
"""
Load the document into Poppler, render pages on a new PDFSurface.
"""
@@ -47,7 +51,7 @@ class PDFParser(abstract.AbstractParser):
pages_count = document.get_n_pages()
tmp_path = tempfile.mkstemp()[1]
pdf_surface = cairo.PDFSurface(tmp_path, 10, 10)
pdf_surface = cairo.PDFSurface(tmp_path, 10, 10) # resized later anyway
pdf_context = cairo.Context(pdf_surface) # context draws on the surface
for pagenum in range(pages_count):
@@ -66,7 +70,7 @@ class PDFParser(abstract.AbstractParser):
return True
def remove_all(self):
def __remove_all_thorough(self) -> bool:
"""
Load the document into Poppler, render pages on PNG,
and shove those PNG into a new PDF.
@@ -101,7 +105,7 @@ class PDFParser(abstract.AbstractParser):
pdf_surface.set_size(page_width*self.__scale, page_height*self.__scale)
pdf_context.set_source_surface(img, 0, 0)
pdf_context.paint()
pdf_context.show_page()
pdf_context.show_page() # draw pdf_context on pdf_surface
pdf_surface.finish()
@@ -120,15 +124,14 @@ class PDFParser(abstract.AbstractParser):
document.save('file://' + os.path.abspath(out_file))
return True
@staticmethod
def __parse_metadata_field(data: str) -> dict:
def __parse_metadata_field(data: str) -> Dict[str, str]:
metadata = {}
for (_, key, value) in re.findall(r"<(xmp|pdfx|pdf|xmpMM):(.+)>(.+)</\1:\2>", data, re.I):
metadata[key] = value
return metadata
def get_meta(self):
def get_meta(self) -> Dict[str, Union[str, dict]]:
""" Return a dict with all the meta of the file
"""
metadata = {}

105
libmat2/subprocess.py Normal file
View File

@@ -0,0 +1,105 @@
"""
Wrapper around a subset of the subprocess module,
that uses bwrap (bubblewrap) when it is available.
Instead of importing subprocess, other modules should use this as follows:
from . import subprocess
"""
import os
import shutil
import subprocess
import tempfile
from typing import List, Optional
__all__ = ['PIPE', 'run', 'CalledProcessError']
PIPE = subprocess.PIPE
CalledProcessError = subprocess.CalledProcessError
def _get_bwrap_path() -> str:
bwrap_path = '/usr/bin/bwrap'
if os.path.isfile(bwrap_path):
if os.access(bwrap_path, os.X_OK):
return bwrap_path
raise RuntimeError("Unable to find bwrap") # pragma: no cover
# pylint: disable=bad-whitespace
def _get_bwrap_args(tempdir: str,
input_filename: str,
output_filename: Optional[str] = None) -> List[str]:
ro_bind_args = []
cwd = os.getcwd()
# XXX: use --ro-bind-try once all supported platforms
# have a bubblewrap recent enough to support it.
ro_bind_dirs = ['/usr', '/lib', '/lib64', '/bin', '/sbin', cwd]
for bind_dir in ro_bind_dirs:
if os.path.isdir(bind_dir): # pragma: no cover
ro_bind_args.extend(['--ro-bind', bind_dir, bind_dir])
ro_bind_files = ['/etc/ld.so.cache']
for bind_file in ro_bind_files:
if os.path.isfile(bind_file): # pragma: no cover
ro_bind_args.extend(['--ro-bind', bind_file, bind_file])
args = ro_bind_args + \
['--dev', '/dev',
'--chdir', cwd,
'--unshare-all',
'--new-session',
# XXX: enable --die-with-parent once all supported platforms have
# a bubblewrap recent enough to support it.
# '--die-with-parent',
]
if output_filename:
# Mount an empty temporary directory where the sandboxed
# process will create its output file
output_dirname = os.path.dirname(os.path.abspath(output_filename))
args.extend(['--bind', tempdir, output_dirname])
absolute_input_filename = os.path.abspath(input_filename)
args.extend(['--ro-bind', absolute_input_filename, absolute_input_filename])
return args
# pylint: disable=bad-whitespace
def run(args: List[str],
input_filename: str,
output_filename: Optional[str] = None,
**kwargs) -> subprocess.CompletedProcess:
"""Wrapper around `subprocess.run`, that uses bwrap (bubblewrap) if it
is available.
Extra supported keyword arguments:
- `input_filename`, made available read-only in the sandbox
- `output_filename`, where the file created by the sandboxed process
is copied upon successful completion; an empty temporary directory
is made visible as the parent directory of this file in the sandbox.
Optional: one valid use case is to invoke an external process
to inspect metadata present in a file.
"""
try:
bwrap_path = _get_bwrap_path()
except RuntimeError: # pragma: no cover
# bubblewrap is not installed ⇒ short-circuit
return subprocess.run(args, **kwargs)
with tempfile.TemporaryDirectory() as tempdir:
prefix_args = [bwrap_path] + \
_get_bwrap_args(input_filename=input_filename,
output_filename=output_filename,
tempdir=tempdir)
completed_process = subprocess.run(prefix_args + args, **kwargs)
if output_filename and completed_process.returncode == 0:
shutil.copy(os.path.join(tempdir, os.path.basename(output_filename)),
output_filename)
return completed_process

View File

@@ -8,33 +8,32 @@ class TorrentParser(abstract.AbstractParser):
mimetypes = {'application/x-bittorrent', }
whitelist = {b'announce', b'announce-list', b'info'}
def get_meta(self) -> Dict[str, str]:
metadata = {}
def __init__(self, filename):
super().__init__(filename)
with open(self.filename, 'rb') as f:
d = _BencodeHandler().bdecode(f.read())
if d is None:
return {'Unknown meta': 'Unable to parse torrent file "%s".' % self.filename}
for k, v in d.items():
if k not in self.whitelist:
metadata[k.decode('utf-8')] = v
return metadata
self.dict_repr = _BencodeHandler().bdecode(f.read())
if self.dict_repr is None:
raise ValueError
def get_meta(self) -> Dict[str, Union[str, dict]]:
metadata = {}
for key, value in self.dict_repr.items():
if key not in self.whitelist:
metadata[key.decode('utf-8')] = value
return metadata
def remove_all(self) -> bool:
cleaned = dict()
with open(self.filename, 'rb') as f:
d = _BencodeHandler().bdecode(f.read())
if d is None:
return False
for k, v in d.items():
if k in self.whitelist:
cleaned[k] = v
for key, value in self.dict_repr.items():
if key in self.whitelist:
cleaned[key] = value
with open(self.output_filename, 'wb') as f:
f.write(_BencodeHandler().bencode(cleaned))
self.dict_repr = cleaned # since we're stateful
return True
class _BencodeHandler(object):
class _BencodeHandler():
"""
Since bencode isn't that hard to parse,
MAT2 comes with its own parser, based on the spec
@@ -60,8 +59,6 @@ class _BencodeHandler(object):
def __decode_int(s: bytes) -> Tuple[int, bytes]:
s = s[1:]
next_idx = s.index(b'e')
if next_idx is None:
raise ValueError # missing suffix
if s.startswith(b'-0'):
raise ValueError # negative zero doesn't exist
elif s.startswith(b'0') and next_idx != 1:
@@ -70,32 +67,30 @@ class _BencodeHandler(object):
@staticmethod
def __decode_string(s: bytes) -> Tuple[bytes, bytes]:
sep = s.index(b':')
if set is None:
raise ValueError # missing suffix
str_len = int(s[:sep])
if str_len < 0:
raise ValueError
elif s[0] == b'0' and sep != 1:
colon = s.index(b':')
# FIXME Python3 is broken here, the call to `ord` shouldn't be needed,
# but apparently it is. This is utterly idiotic.
if (s[0] == ord('0') or s[0] == '0') and colon != 1:
raise ValueError
str_len = int(s[:colon])
s = s[1:]
return s[sep:sep+str_len], s[sep+str_len:]
return s[colon:colon+str_len], s[colon+str_len:]
def __decode_list(self, s: bytes) -> Tuple[list, bytes]:
r = list()
ret = list()
s = s[1:] # skip leading `l`
while s[0] != ord('e'):
v, s = self.__decode_func[s[0]](s)
r.append(v)
return r, s[1:]
value, s = self.__decode_func[s[0]](s)
ret.append(value)
return ret, s[1:]
def __decode_dict(self, s: bytes) -> Tuple[dict, bytes]:
r = dict()
ret = dict()
s = s[1:] # skip leading `d`
while s[0] != ord(b'e'):
k, s = self.__decode_string(s)
r[k], s = self.__decode_func[s[0]](s)
return r, s[1:]
key, s = self.__decode_string(s)
ret[key], s = self.__decode_func[s[0]](s)
return ret, s[1:]
@staticmethod
def __encode_int(x: bytes) -> bytes:
@@ -113,9 +108,9 @@ class _BencodeHandler(object):
def __encode_dict(self, x: dict) -> bytes:
ret = b''
for k, v in sorted(x.items()):
ret += self.__encode_func[type(k)](k)
ret += self.__encode_func[type(v)](v)
for key, value in sorted(x.items()):
ret += self.__encode_func[type(key)](key)
ret += self.__encode_func[type(value)](value)
return b'd' + ret + b'e'
def bencode(self, s: Union[dict, list, bytes, int]) -> bytes:
@@ -123,11 +118,11 @@ class _BencodeHandler(object):
def bdecode(self, s: bytes) -> Union[dict, None]:
try:
r, l = self.__decode_func[s[0]](s)
ret, trail = self.__decode_func[s[0]](s)
except (IndexError, KeyError, ValueError) as e:
logging.debug("Not a valid bencoded string: %s", e)
logging.warning("Not a valid bencoded string: %s", e)
return None
if l != b'':
logging.debug("Invalid bencoded value (data after valid prefix)")
if trail != b'':
logging.warning("Invalid bencoded value (data after valid prefix)")
return None
return r
return ret

139
libmat2/video.py Normal file
View File

@@ -0,0 +1,139 @@
import os
import logging
from typing import Dict, Union
from . import exiftool
from . import subprocess
class AbstractFFmpegParser(exiftool.ExiftoolParser):
""" Abstract parser for all FFmpeg-based ones, mainly for video. """
# Some fileformats have mandatory metadata fields
meta_key_value_whitelist = {} # type: Dict[str, Union[str, int]]
def remove_all(self) -> bool:
if self.meta_key_value_whitelist:
logging.warning('The format of "%s" (%s) has some mandatory '
'metadata fields; mat2 filled them with standard '
'data.', self.filename, ', '.join(self.mimetypes))
cmd = [_get_ffmpeg_path(),
'-i', self.filename, # input file
'-y', # overwrite existing output file
'-map', '0', # copy everything all streams from input to output
'-codec', 'copy', # don't decode anything, just copy (speed!)
'-loglevel', 'panic', # Don't show log
'-hide_banner', # hide the banner
'-map_metadata', '-1', # remove supperficial metadata
'-map_chapters', '-1', # remove chapters
'-disposition', '0', # Remove dispositions (check ffmpeg's manpage)
'-fflags', '+bitexact', # don't add any metadata
'-flags:v', '+bitexact', # don't add any metadata
'-flags:a', '+bitexact', # don't add any metadata
self.output_filename]
try:
subprocess.run(cmd, check=True,
input_filename=self.filename,
output_filename=self.output_filename)
except subprocess.CalledProcessError as e:
logging.error("Something went wrong during the processing of %s: %s", self.filename, e)
return False
return True
def get_meta(self) -> Dict[str, Union[str, dict]]:
meta = super().get_meta()
ret = dict() # type: Dict[str, Union[str, dict]]
for key, value in meta.items():
if key in self.meta_key_value_whitelist.keys():
if value == self.meta_key_value_whitelist[key]:
continue
ret[key] = value
return ret
class WMVParser(AbstractFFmpegParser):
mimetypes = {'video/x-ms-wmv', }
meta_whitelist = {'AudioChannels', 'AudioCodecID', 'AudioCodecName',
'ErrorCorrectionType', 'AudioSampleRate', 'DataPackets',
'Directory', 'Duration', 'ExifToolVersion',
'FileAccessDate', 'FileInodeChangeDate', 'FileLength',
'FileModifyDate', 'FileName', 'FilePermissions',
'FileSize', 'FileType', 'FileTypeExtension',
'FrameCount', 'FrameRate', 'ImageHeight', 'ImageSize',
'ImageWidth', 'MIMEType', 'MaxBitrate', 'MaxPacketSize',
'Megapixels', 'MinPacketSize', 'Preroll', 'SendDuration',
'SourceFile', 'StreamNumber', 'VideoCodecName', }
meta_key_value_whitelist = { # some metadata are mandatory :/
'AudioCodecDescription': '',
'CreationDate': '0000:00:00 00:00:00Z',
'FileID': '00000000-0000-0000-0000-000000000000',
'Flags': 2, # FIXME: What is this? Why 2?
'ModifyDate': '0000:00:00 00:00:00',
'TimeOffset': '0 s',
'VideoCodecDescription': '',
'StreamType': 'Audio',
}
class AVIParser(AbstractFFmpegParser):
mimetypes = {'video/x-msvideo', }
meta_whitelist = {'SourceFile', 'ExifToolVersion', 'FileName', 'Directory',
'FileSize', 'FileModifyDate', 'FileAccessDate',
'FileInodeChangeDate', 'FilePermissions', 'FileType',
'FileTypeExtension', 'MIMEType', 'FrameRate', 'MaxDataRate',
'FrameCount', 'StreamCount', 'StreamType', 'VideoCodec',
'VideoFrameRate', 'VideoFrameCount', 'Quality',
'SampleSize', 'BMPVersion', 'ImageWidth', 'ImageHeight',
'Planes', 'BitDepth', 'Compression', 'ImageLength',
'PixelsPerMeterX', 'PixelsPerMeterY', 'NumColors',
'NumImportantColors', 'NumColors', 'NumImportantColors',
'RedMask', 'GreenMask', 'BlueMask', 'AlphaMask',
'ColorSpace', 'AudioCodec', 'AudioCodecRate',
'AudioSampleCount', 'AudioSampleCount',
'AudioSampleRate', 'Encoding', 'NumChannels',
'SampleRate', 'AvgBytesPerSec', 'BitsPerSample',
'Duration', 'ImageSize', 'Megapixels'}
class MP4Parser(AbstractFFmpegParser):
mimetypes = {'video/mp4', }
meta_whitelist = {'AudioFormat', 'AvgBitrate', 'Balance', 'TrackDuration',
'XResolution', 'YResolution', 'ExifToolVersion',
'FileAccessDate', 'FileInodeChangeDate', 'FileModifyDate',
'FileName', 'FilePermissions', 'MIMEType', 'FileType',
'FileTypeExtension', 'Directory', 'ImageWidth',
'ImageSize', 'ImageHeight', 'FileSize', 'SourceFile',
'BitDepth', 'Duration', 'AudioChannels',
'AudioBitsPerSample', 'AudioSampleRate', 'Megapixels',
'MovieDataSize', 'VideoFrameRate', 'MediaTimeScale',
'SourceImageHeight', 'SourceImageWidth',
'MatrixStructure', 'MediaDuration'}
meta_key_value_whitelist = { # some metadata are mandatory :/
'CreateDate': '0000:00:00 00:00:00',
'CurrentTime': '0 s',
'MediaCreateDate': '0000:00:00 00:00:00',
'MediaLanguageCode': 'und',
'MediaModifyDate': '0000:00:00 00:00:00',
'ModifyDate': '0000:00:00 00:00:00',
'OpColor': '0 0 0',
'PosterTime': '0 s',
'PreferredRate': '1',
'PreferredVolume': '100.00%',
'PreviewDuration': '0 s',
'PreviewTime': '0 s',
'SelectionDuration': '0 s',
'SelectionTime': '0 s',
'TrackCreateDate': '0000:00:00 00:00:00',
'TrackModifyDate': '0000:00:00 00:00:00',
'TrackVolume': '0.00%',
}
def _get_ffmpeg_path() -> str: # pragma: no cover
ffmpeg_path = '/usr/bin/ffmpeg'
if os.path.isfile(ffmpeg_path):
if os.access(ffmpeg_path, os.X_OK):
return ffmpeg_path
raise RuntimeError("Unable to find ffmpeg")

126
mat2
View File

@@ -1,22 +1,30 @@
#!/usr/bin/python3
#!/usr/bin/env python3
import os
from typing import Tuple
from typing import Tuple, Generator, List, Union
import sys
import itertools
import mimetypes
import argparse
import multiprocessing
import logging
import unicodedata
try:
from libmat2 import parser_factory, unsupported_extensions
from libmat2 import parser_factory, UNSUPPORTED_EXTENSIONS
from libmat2 import check_dependencies, UnknownMemberPolicy
except ValueError as e:
print(e)
sys.exit(1)
__version__ = '0.1.3'
__version__ = '0.7.0'
def __check_file(filename: str, mode: int = os.R_OK) -> bool:
# Make pyflakes happy
assert Tuple
assert Union
logging.basicConfig(format='%(levelname)s: %(message)s', level=logging.WARNING)
def __check_file(filename: str, mode: int=os.R_OK) -> bool:
if not os.path.exists(filename):
print("[-] %s is doesn't exist." % filename)
return False
@@ -29,19 +37,25 @@ def __check_file(filename: str, mode: int = os.R_OK) -> bool:
return True
def create_arg_parser():
def create_arg_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(description='Metadata anonymisation toolkit 2')
parser.add_argument('files', nargs='*')
parser.add_argument('files', nargs='*', help='the files to process')
parser.add_argument('-v', '--version', action='version',
version='MAT2 %s' % __version__)
parser.add_argument('-l', '--list', action='store_true',
help='list all supported fileformats')
parser.add_argument('--check-dependencies', action='store_true',
help='check if MAT2 has all the dependencies it needs')
parser.add_argument('-V', '--verbose', action='store_true',
help='show more verbose status information')
parser.add_argument('--unknown-members', metavar='policy', default='abort',
help='how to handle unknown members of archive-style files (policy should' +
' be one of: %s)' % ', '.join(p.value for p in UnknownMemberPolicy))
info = parser.add_mutually_exclusive_group()
info.add_argument('-c', '--check', action='store_true',
help='check if a file is free of harmful metadatas')
info.add_argument('-s', '--show', action='store_true',
help='list all the harmful metadata of a file without removing them')
help='list harmful metadata detectable by MAT2 without removing them')
info.add_argument('-L', '--lightweight', action='store_true',
help='remove SOME metadata')
return parser
@@ -55,16 +69,37 @@ def show_meta(filename: str):
if p is None:
print("[-] %s's format (%s) is not supported" % (filename, mtype))
return
__print_meta(filename, p.get_meta())
def __print_meta(filename: str, metadata: dict, depth: int=1):
padding = " " * depth*2
if not metadata:
print(padding + "No metadata found")
return
print("[%s] Metadata for %s:" % ('+'*depth, filename))
for (k, v) in sorted(metadata.items()):
if isinstance(v, dict):
__print_meta(k, v, depth+1)
continue
# Remove control characters
# We might use 'Cc' instead of 'C', but better safe than sorry
# https://www.unicode.org/reports/tr44/#GC_Values_Table
try:
v = ''.join(ch for ch in v if not unicodedata.category(ch).startswith('C'))
except TypeError:
pass # for things that aren't iterable
print("[+] Metadata for %s:" % filename)
for k, v in p.get_meta().items():
try: # FIXME this is ugly.
print(" %s: %s" % (k, v))
print(padding + " %s: %s" % (k, v))
except UnicodeEncodeError:
print(" %s: harmful content" % k)
print(padding + " %s: harmful content" % k)
def clean_meta(params: Tuple[str, bool]) -> bool:
filename, is_lightweigth = params
def clean_meta(filename: str, is_lightweight: bool, policy: UnknownMemberPolicy) -> bool:
if not __check_file(filename, os.R_OK|os.W_OK):
return False
@@ -72,29 +107,35 @@ def clean_meta(params: Tuple[str, bool]) -> bool:
if p is None:
print("[-] %s's format (%s) is not supported" % (filename, mtype))
return False
if is_lightweigth:
return p.remove_all_lightweight()
return p.remove_all()
p.unknown_member_policy = policy
p.lightweight_cleaning = is_lightweight
try:
return p.remove_all()
except RuntimeError as e:
print("[-] %s can't be cleaned: %s" % (filename, e))
return False
def show_parsers():
print('[+] Supported formats:')
formats = list()
for parser in parser_factory._get_parsers():
formats = set() # Set[str]
for parser in parser_factory._get_parsers(): # type: ignore
for mtype in parser.mimetypes:
extensions = set()
extensions = set() # Set[str]
for extension in mimetypes.guess_all_extensions(mtype):
if extension[1:] not in unsupported_extensions: # skip the dot
if extension not in UNSUPPORTED_EXTENSIONS:
extensions.add(extension)
if not extensions:
# we're not supporting a single extension in the current
# mimetype, so there is not point in showing the mimetype at all
continue
formats.append(' - %s (%s)' % (mtype, ', '.join(extensions)))
formats.add(' - %s (%s)' % (mtype, ', '.join(extensions)))
print('\n'.join(sorted(formats)))
def __get_files_recursively(files):
def __get_files_recursively(files: List[str]) -> Generator[str, None, None]:
for f in files:
if os.path.isdir(f):
for path, _, _files in os.walk(f):
@@ -105,14 +146,23 @@ def __get_files_recursively(files):
elif __check_file(f):
yield f
def main():
def main() -> int:
arg_parser = create_arg_parser()
args = arg_parser.parse_args()
if args.verbose:
logging.basicConfig(level=logging.INFO)
if not args.files:
if not args.list:
return arg_parser.print_help()
show_parsers()
if args.list:
show_parsers()
return 0
elif args.check_dependencies:
print("Dependencies required for MAT2 %s:" % __version__)
for key, value in sorted(check_dependencies().items()):
print('- %s: %s' % (key, 'yes' if value else 'no'))
else:
arg_parser.print_help()
return 0
elif args.show:
@@ -121,12 +171,16 @@ def main():
return 0
else:
p = multiprocessing.Pool()
mode = (args.lightweight is True)
l = zip(__get_files_recursively(args.files), itertools.repeat(mode))
policy = UnknownMemberPolicy(args.unknown_members)
if policy == UnknownMemberPolicy.KEEP:
logging.warning('Keeping unknown member files may leak metadata in the resulting file!')
no_failure = True
for f in __get_files_recursively(args.files):
if clean_meta(f, args.lightweight, policy) is False:
no_failure = False
return 0 if no_failure is True else -1
ret = list(p.imap_unordered(clean_meta, list(l)))
return 0 if all(ret) else -1
if __name__ == '__main__':
sys.exit(main())

12
nautilus/README.md Normal file
View File

@@ -0,0 +1,12 @@
# mat2's Nautilus extension
# Dependencies
- Nautilus (now known as [Files](https://wiki.gnome.org/action/show/Apps/Files))
- [nautilus-python](https://gitlab.gnome.org/GNOME/nautilus-python) >= 2.10
# Installation
Simply copy the `mat2.py` file to `~/.local/share/nautilus-python/extensions`,
and launch Nautilus; you should now have a "Remove metadata" item in the
right-clic menu on supported files.

244
nautilus/mat2.py Normal file
View File

@@ -0,0 +1,244 @@
#!/usr/bin/env python3
"""
Because writing GUI is non-trivial (cf. https://0xacab.org/jvoisin/mat2/issues/3),
we decided to write a Nautilus extensions instead
(cf. https://0xacab.org/jvoisin/mat2/issues/2).
The code is a little bit convoluted because Gtk isn't thread-safe,
so we're not allowed to call anything Gtk-related outside of the main
thread, so we'll have to resort to using a `queue` to pass "messages" around.
"""
# pylint: disable=no-name-in-module,unused-argument,no-self-use,import-error
import queue
import threading
from typing import Tuple, Optional, List
from urllib.parse import unquote
import gi
gi.require_version('Nautilus', '3.0')
gi.require_version('Gtk', '3.0')
gi.require_version('GdkPixbuf', '2.0')
from gi.repository import Nautilus, GObject, Gtk, Gio, GLib, GdkPixbuf
from libmat2 import parser_factory
def _remove_metadata(fpath) -> Tuple[bool, Optional[str]]:
""" This is a simple wrapper around libmat2, because it's
easier and cleaner this way.
"""
parser, mtype = parser_factory.get_parser(fpath)
if parser is None:
return False, mtype
return parser.remove_all(), mtype
class Mat2Extension(GObject.GObject, Nautilus.MenuProvider, Nautilus.LocationWidgetProvider):
""" This class adds an item to the right-clic menu in Nautilus. """
def __init__(self):
super().__init__()
self.infobar_hbox = None
self.infobar = None
self.failed_items = list()
def __infobar_failure(self):
""" Add an hbox to the `infobar` warning about the fact that we didn't
manage to remove the metadata from every single file.
"""
self.infobar.set_show_close_button(True)
self.infobar_hbox = Gtk.Box(orientation=Gtk.Orientation.HORIZONTAL)
btn = Gtk.Button("Show")
btn.connect("clicked", self.__cb_show_failed)
self.infobar_hbox.pack_end(btn, False, False, 0)
infobar_msg = Gtk.Label("Failed to clean some items")
self.infobar_hbox.pack_start(infobar_msg, False, False, 0)
self.infobar.get_content_area().pack_start(self.infobar_hbox, True, True, 0)
self.infobar.show_all()
def get_widget(self, uri, window) -> Gtk.Widget:
""" This is the method that we have to implement (because we're
a LocationWidgetProvider) in order to show our infobar.
"""
self.infobar = Gtk.InfoBar()
self.infobar.set_message_type(Gtk.MessageType.ERROR)
self.infobar.connect("response", self.__cb_infobar_response)
return self.infobar
def __cb_infobar_response(self, infobar, response):
""" Callback for the infobar close button.
"""
if response == Gtk.ResponseType.CLOSE:
self.infobar_hbox.destroy()
self.infobar.hide()
def __cb_show_failed(self, button):
""" Callback to show a popup containing a list of files
that we didn't manage to clean.
"""
# FIXME this should be done only once the window is destroyed
self.infobar_hbox.destroy()
self.infobar.hide()
window = Gtk.Window()
headerbar = Gtk.HeaderBar()
window.set_titlebar(headerbar)
headerbar.props.title = "Metadata removal failed"
close_buton = Gtk.Button("Close")
close_buton.connect("clicked", lambda _: window.close())
headerbar.pack_end(close_buton)
box = Gtk.Box(orientation=Gtk.Orientation.VERTICAL)
window.add(box)
box.add(self.__create_treeview())
window.show_all()
@staticmethod
def __validate(fileinfo) -> Tuple[bool, str]:
""" Validate if a given file FileInfo `fileinfo` can be processed.
Returns a boolean, and a textreason why"""
if fileinfo.get_uri_scheme() != "file" or fileinfo.is_directory():
return False, "Not a file"
elif not fileinfo.can_write():
return False, "Not writeable"
return True, ""
def __create_treeview(self) -> Gtk.TreeView:
liststore = Gtk.ListStore(GdkPixbuf.Pixbuf, str, str)
treeview = Gtk.TreeView(model=liststore)
renderer_pixbuf = Gtk.CellRendererPixbuf()
column_pixbuf = Gtk.TreeViewColumn("Icon", renderer_pixbuf, pixbuf=0)
treeview.append_column(column_pixbuf)
for idx, name in enumerate(['File', 'Reason']):
renderer_text = Gtk.CellRendererText()
column_text = Gtk.TreeViewColumn(name, renderer_text, text=idx+1)
treeview.append_column(column_text)
for (fname, mtype, reason) in self.failed_items:
# This part is all about adding mimetype icons to the liststore
icon = Gio.content_type_get_icon('text/plain' if not mtype else mtype)
# in case we don't have the corresponding icon,
# we're adding `text/plain`, because we have this one for sure™
names = icon.get_names() + ['text/plain', ]
icon_theme = Gtk.IconTheme.get_default()
for name in names:
try:
img = icon_theme.load_icon(name, Gtk.IconSize.BUTTON, 0)
break
except GLib.GError:
pass
liststore.append([img, fname, reason])
treeview.show_all()
return treeview
def __create_progressbar(self) -> Gtk.ProgressBar:
""" Create the progressbar used to notify that files are currently
being processed.
"""
self.infobar.set_show_close_button(False)
self.infobar.set_message_type(Gtk.MessageType.INFO)
self.infobar_hbox = Gtk.Box(orientation=Gtk.Orientation.HORIZONTAL)
progressbar = Gtk.ProgressBar()
self.infobar_hbox.pack_start(progressbar, True, True, 0)
progressbar.set_show_text(True)
self.infobar.get_content_area().pack_start(self.infobar_hbox, True, True, 0)
self.infobar.show_all()
return progressbar
def __update_progressbar(self, processing_queue, progressbar) -> bool:
""" This method is run via `Glib.add_idle` to update the progressbar."""
try:
fname = processing_queue.get(block=False)
except queue.Empty:
return True
# `None` is the marker put in the queue to signal that every selected
# file was processed.
if fname is None:
self.infobar_hbox.destroy()
self.infobar.hide()
if len(self.failed_items):
self.__infobar_failure()
if not processing_queue.empty():
print("Something went wrong, the queue isn't empty :/")
return False
progressbar.pulse()
progressbar.set_text("Cleaning %s" % fname)
progressbar.show_all()
self.infobar_hbox.show_all()
self.infobar.show_all()
return True
def __clean_files(self, files: list, processing_queue: queue.Queue) -> bool:
""" This method is threaded in order to avoid blocking the GUI
while cleaning up the files.
"""
for fileinfo in files:
fname = fileinfo.get_name()
processing_queue.put(fname)
valid, reason = self.__validate(fileinfo)
if not valid:
self.failed_items.append((fname, None, reason))
continue
fpath = unquote(fileinfo.get_uri()[7:]) # `len('file://') = 7`
success, mtype = _remove_metadata(fpath)
if not success:
self.failed_items.append((fname, mtype, 'Unsupported/invalid'))
processing_queue.put(None) # signal that we processed all the files
return True
def __cb_menu_activate(self, menu, files):
""" This method is called when the user clicked the "clean metadata"
menu item.
"""
self.failed_items = list()
progressbar = self.__create_progressbar()
progressbar.set_pulse_step = 1.0 / len(files)
self.infobar.show_all()
processing_queue = queue.Queue()
GLib.idle_add(self.__update_progressbar, processing_queue, progressbar)
thread = threading.Thread(target=self.__clean_files, args=(files, processing_queue))
thread.daemon = True
thread.start()
def get_background_items(self, window, file):
""" https://bugzilla.gnome.org/show_bug.cgi?id=784278 """
return None
def get_file_items(self, window, files) -> Optional[List[Nautilus.MenuItem]]:
""" This method is the one allowing us to create a menu item.
"""
# Do not show the menu item if not a single file has a chance to be
# processed by mat2.
if not any([is_valid for (is_valid, _) in map(self.__validate, files)]):
return None
item = Nautilus.MenuItem(
name="MAT2::Remove_metadata",
label="Remove metadata",
tip="Remove metadata"
)
item.connect('activate', self.__cb_menu_activate, files)
return [item, ]

View File

@@ -1,29 +0,0 @@
#!/usr/bin/env python3
import gi
gi.require_version('Nautilus', '3.0')
from gi.repository import Nautilus, GObject
class ColumnExtension(GObject.GObject, Nautilus.MenuProvider):
def menu_activate_cb(self, menu, file):
print "menu_activate_cb", file
# TODO: clean metadata here
def get_background_items(self, window, file):
""" https://bugzilla.gnome.org/show_bug.cgi?id=784278 """
return None
def get_file_items(self, window, files):
if len(files) != 1: # we're not supporting multiple files for now
return
file = files[0]
item = Nautilus.MenuItem(
name="MAT2::Remove_metadata",
label="Remove metadata from %s" % file.get_name(),
tip="Remove metadata from %s" % file.get_name()
)
item.connect('activate', self.menu_activate_cb, file)
return [item]

View File

@@ -5,7 +5,7 @@ with open("README.md", "r") as fh:
setuptools.setup(
name="mat2",
version='0.1.3',
version='0.7.0',
author="Julien (jvoisin) Voisin",
author_email="julien.voisin+mat2@dustri.org",
description="A handy tool to trash your metadata",
@@ -20,7 +20,7 @@ setuptools.setup(
'pycairo',
],
packages=setuptools.find_packages(exclude=('tests', )),
classifiers=(
classifiers=[
"Development Status :: 3 - Alpha",
"Environment :: Console",
"License :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)",
@@ -28,7 +28,7 @@ setuptools.setup(
"Programming Language :: Python :: 3 :: Only",
"Topic :: Security",
"Intended Audience :: End Users/Desktop",
),
],
project_urls={
'bugtacker': 'https://0xacab.org/jvoisin/mat2/issues',
},

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.9 KiB

BIN
tests/data/dirty.avi Normal file

Binary file not shown.

Binary file not shown.

BIN
tests/data/dirty.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 KiB

14
tests/data/dirty.html Normal file
View File

@@ -0,0 +1,14 @@
<html>
<head>
<meta content="vim" name="generator"/>
<meta content="jvoisin" name="author"/>
</head>
<body>
<p>
<h1>Hello</h1>
I am a web page.
Please <b>love</b> me.
Here, have a pretty picture: <img src='dirty.jpg' alt='a pretty picture'/>
</p>
</body>
</html>

BIN
tests/data/dirty.mp4 Normal file

Binary file not shown.

BIN
tests/data/dirty.wmv Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -4,62 +4,102 @@ import subprocess
import unittest
mat2_binary = ['./mat2']
if 'MAT2_GLOBAL_PATH_TESTSUITE' in os.environ:
# Debian runs tests after installing the package
# https://0xacab.org/jvoisin/mat2/issues/16#note_153878
mat2_binary = ['/usr/bin/env', 'mat2']
class TestHelp(unittest.TestCase):
def test_help(self):
proc = subprocess.Popen(['./mat2', '--help'], stdout=subprocess.PIPE)
proc = subprocess.Popen(mat2_binary + ['--help'], stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'usage: mat2 [-h] [-v] [-l] [-c | -s | -L] [files [files ...]]', stdout)
self.assertIn(b'usage: mat2 [-h] [-v] [-l] [--check-dependencies] [-V]',
stdout)
self.assertIn(b'[--unknown-members policy] [-s | -L]', stdout)
def test_no_arg(self):
proc = subprocess.Popen(['./mat2'], stdout=subprocess.PIPE)
proc = subprocess.Popen(mat2_binary, stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'usage: mat2 [-h] [-v] [-l] [-c | -s | -L] [files [files ...]]', stdout)
self.assertIn(b'usage: mat2 [-h] [-v] [-l] [--check-dependencies] [-V]',
stdout)
self.assertIn(b'[--unknown-members policy] [-s | -L]', stdout)
class TestVersion(unittest.TestCase):
def test_version(self):
proc = subprocess.Popen(['./mat2', '--version'], stdout=subprocess.PIPE)
proc = subprocess.Popen(mat2_binary + ['--version'], stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertTrue(stdout.startswith(b'MAT2 '))
class TestExclusiveArgs(unittest.TestCase):
def test_version(self):
proc = subprocess.Popen(['./mat2', '-s', '-c'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = proc.communicate()
self.assertIn(b'mat2: error: argument -c/--check: not allowed with argument -s/--show', stderr)
class TestDependencies(unittest.TestCase):
def test_dependencies(self):
proc = subprocess.Popen(mat2_binary + ['--check-dependencies'], stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertTrue(b'MAT2' in stdout)
class TestReturnValue(unittest.TestCase):
def test_nonzero(self):
ret = subprocess.call(['./mat2', './mat2'], stdout=subprocess.DEVNULL)
ret = subprocess.call(mat2_binary + ['mat2'], stdout=subprocess.DEVNULL)
self.assertEqual(255, ret)
ret = subprocess.call(['./mat2', '--whololo'], stderr=subprocess.DEVNULL)
ret = subprocess.call(mat2_binary + ['--whololo'], stderr=subprocess.DEVNULL)
self.assertEqual(2, ret)
def test_zero(self):
ret = subprocess.call(['./mat2'], stdout=subprocess.DEVNULL)
ret = subprocess.call(mat2_binary, stdout=subprocess.DEVNULL)
self.assertEqual(0, ret)
ret = subprocess.call(['./mat2', '--show', './mat2'], stdout=subprocess.DEVNULL)
ret = subprocess.call(mat2_binary + ['--show', 'mat2'], stdout=subprocess.DEVNULL)
self.assertEqual(0, ret)
class TestCleanFolder(unittest.TestCase):
def test_jpg(self):
try:
os.mkdir('./tests/data/folder/')
except FileExistsError:
pass
shutil.copy('./tests/data/dirty.jpg', './tests/data/folder/clean1.jpg')
shutil.copy('./tests/data/dirty.jpg', './tests/data/folder/clean2.jpg')
proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/folder/'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'Comment: Created with GIMP', stdout)
proc = subprocess.Popen(mat2_binary + ['./tests/data/folder/'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
os.remove('./tests/data/folder/clean1.jpg')
os.remove('./tests/data/folder/clean2.jpg')
proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/folder/'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertNotIn(b'Comment: Created with GIMP', stdout)
self.assertIn(b'No metadata found', stdout)
shutil.rmtree('./tests/data/folder/')
class TestCleanMeta(unittest.TestCase):
def test_jpg(self):
shutil.copy('./tests/data/dirty.jpg', './tests/data/clean.jpg')
proc = subprocess.Popen(['./mat2', '--show', './tests/data/clean.jpg'],
proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/clean.jpg'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'Comment: Created with GIMP', stdout)
proc = subprocess.Popen(['./mat2', './tests/data/clean.jpg'],
proc = subprocess.Popen(mat2_binary + ['./tests/data/clean.jpg'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
proc = subprocess.Popen(['./mat2', '--show', './tests/data/clean.cleaned.jpg'],
proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/clean.cleaned.jpg'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertNotIn(b'Comment: Created with GIMP', stdout)
@@ -69,32 +109,34 @@ class TestCleanMeta(unittest.TestCase):
class TestIsSupported(unittest.TestCase):
def test_pdf(self):
proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.pdf'],
proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.pdf'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertNotIn(b"isn't supported", stdout)
class TestGetMeta(unittest.TestCase):
maxDiff = None
def test_pdf(self):
proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.pdf'],
proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.pdf'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'producer: pdfTeX-1.40.14', stdout)
self.assertIn(b'Producer: pdfTeX-1.40.14', stdout)
def test_png(self):
proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.png'],
proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.png'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'Comment: This is a comment, be careful!', stdout)
def test_jpg(self):
proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.jpg'],
proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.jpg'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'Comment: Created with GIMP', stdout)
def test_docx(self):
proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.docx'],
proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.docx'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'Application: LibreOffice/5.4.5.1$Linux_X86_64', stdout)
@@ -102,7 +144,7 @@ class TestGetMeta(unittest.TestCase):
self.assertIn(b'revision: 1', stdout)
def test_odt(self):
proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.odt'],
proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.odt'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'generator: LibreOffice/3.3$Unix', stdout)
@@ -110,25 +152,32 @@ class TestGetMeta(unittest.TestCase):
self.assertIn(b'date_time: 2011-07-26 02:40:16', stdout)
def test_mp3(self):
proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.mp3'],
proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.mp3'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'TALB: harmfull', stdout)
self.assertIn(b'COMM::: Thank you for using MAT !', stdout)
def test_flac(self):
proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.flac'],
stdout=subprocess.PIPE)
proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.flac'],
stdout=subprocess.PIPE, bufsize=0)
stdout, _ = proc.communicate()
self.assertIn(b'comments: Thank you for using MAT !', stdout)
self.assertIn(b'genre: Python', stdout)
self.assertIn(b'title: I am so', stdout)
def test_ogg(self):
proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.ogg'],
proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.ogg'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'comments: Thank you for using MAT !', stdout)
self.assertIn(b'genre: Python', stdout)
self.assertIn(b'i am a : various comment', stdout)
self.assertIn(b'artist: jvoisin', stdout)
class TestControlCharInjection(unittest.TestCase):
def test_jpg(self):
proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/control_chars.jpg'],
stdout=subprocess.PIPE)
stdout, _ = proc.communicate()
self.assertIn(b'Comment: GQ\n', stdout)

View File

@@ -1,11 +1,54 @@
#!/usr/bin/python3
#!/usr/bin/env python3
import unittest
import shutil
import os
import logging
import zipfile
from libmat2 import pdf, images, audio, office, parser_factory, torrent, harmless
from libmat2 import pdf, images, audio, office, parser_factory, torrent
from libmat2 import harmless, video, html
# No need to logging messages, should something go wrong,
# the testsuite _will_ fail.
logger = logging.getLogger()
logger.setLevel(logging.FATAL)
class TestInexistentFiles(unittest.TestCase):
def test_ro(self):
parser, mimetype = parser_factory.get_parser('/etc/passwd')
self.assertEqual(mimetype, None)
self.assertEqual(parser, None)
def test_notaccessible(self):
parser, mimetype = parser_factory.get_parser('/etc/shadow')
self.assertEqual(mimetype, None)
self.assertEqual(parser, None)
def test_folder(self):
parser, mimetype = parser_factory.get_parser('./tests/')
self.assertEqual(mimetype, None)
self.assertEqual(parser, None)
def test_inexistingfile(self):
parser, mimetype = parser_factory.get_parser('./tests/NONEXISTING_FILE')
self.assertEqual(mimetype, None)
self.assertEqual(parser, None)
def test_chardevice(self):
parser, mimetype = parser_factory.get_parser('/dev/zero')
self.assertEqual(mimetype, None)
self.assertEqual(parser, None)
def test_brokensymlink(self):
shutil.copy('./tests/test_libmat2.py', './tests/clean.py')
os.symlink('./tests/clean.py', './tests/SYMLINK')
os.remove('./tests/clean.py')
parser, mimetype = parser_factory.get_parser('./tests/SYMLINK')
self.assertEqual(mimetype, None)
self.assertEqual(parser, None)
os.unlink('./tests/SYMLINK')
class TestUnsupportedFiles(unittest.TestCase):
def test_pdf(self):
@@ -15,6 +58,21 @@ class TestUnsupportedFiles(unittest.TestCase):
self.assertEqual(parser, None)
os.remove('./tests/clean.py')
class TestCorruptedEmbedded(unittest.TestCase):
def test_docx(self):
shutil.copy('./tests/data/embedded_corrupted.docx', './tests/data/clean.docx')
parser, _ = parser_factory.get_parser('./tests/data/clean.docx')
self.assertFalse(parser.remove_all())
self.assertIsNotNone(parser.get_meta())
os.remove('./tests/data/clean.docx')
def test_odt(self):
shutil.copy('./tests/data/embedded_corrupted.odt', './tests/data/clean.odt')
parser, _ = parser_factory.get_parser('./tests/data/clean.odt')
self.assertFalse(parser.remove_all())
self.assertTrue(parser.get_meta())
os.remove('./tests/data/clean.odt')
class TestExplicitelyUnsupportedFiles(unittest.TestCase):
def test_pdf(self):
@@ -25,6 +83,26 @@ class TestExplicitelyUnsupportedFiles(unittest.TestCase):
os.remove('./tests/data/clean.py')
class TestWrongContentTypesFileOffice(unittest.TestCase):
def test_office_incomplete(self):
shutil.copy('./tests/data/malformed_content_types.docx', './tests/data/clean.docx')
p = office.MSOfficeParser('./tests/data/clean.docx')
self.assertIsNotNone(p)
self.assertFalse(p.remove_all())
os.remove('./tests/data/clean.docx')
def test_office_broken(self):
shutil.copy('./tests/data/broken_xml_content_types.docx', './tests/data/clean.docx')
with self.assertRaises(ValueError):
office.MSOfficeParser('./tests/data/clean.docx')
os.remove('./tests/data/clean.docx')
def test_office_absent(self):
shutil.copy('./tests/data/no_content_types.docx', './tests/data/clean.docx')
with self.assertRaises(ValueError):
office.MSOfficeParser('./tests/data/clean.docx')
os.remove('./tests/data/clean.docx')
class TestCorruptedFiles(unittest.TestCase):
def test_pdf(self):
shutil.copy('./tests/data/dirty.png', './tests/data/clean.png')
@@ -40,21 +118,40 @@ class TestCorruptedFiles(unittest.TestCase):
def test_png2(self):
shutil.copy('./tests/test_libmat2.py', './tests/clean.png')
parser, mimetype = parser_factory.get_parser('./tests/clean.png')
parser, _ = parser_factory.get_parser('./tests/clean.png')
self.assertIsNone(parser)
os.remove('./tests/clean.png')
def test_torrent(self):
shutil.copy('./tests/data/dirty.png', './tests/data/clean.torrent')
p = torrent.TorrentParser('./tests/data/clean.torrent')
self.assertFalse(p.remove_all())
expected = {'Unknown meta': 'Unable to parse torrent file "./tests/data/clean.torrent".'}
self.assertEqual(p.get_meta(), expected)
with self.assertRaises(ValueError):
torrent.TorrentParser('./tests/data/clean.torrent')
with open("./tests/data/clean.torrent", "a") as f:
f.write("trailing garbage")
p = torrent.TorrentParser('./tests/data/clean.torrent')
self.assertEqual(p.get_meta(), expected)
with self.assertRaises(ValueError):
torrent.TorrentParser('./tests/data/clean.torrent')
with open("./tests/data/clean.torrent", "w") as f:
f.write("i-0e")
with self.assertRaises(ValueError):
torrent.TorrentParser('./tests/data/clean.torrent')
with open("./tests/data/clean.torrent", "w") as f:
f.write("i00e")
with self.assertRaises(ValueError):
torrent.TorrentParser('./tests/data/clean.torrent')
with open("./tests/data/clean.torrent", "w") as f:
f.write("01:AAAAAAAAA")
with self.assertRaises(ValueError):
torrent.TorrentParser('./tests/data/clean.torrent')
with open("./tests/data/clean.torrent", "w") as f:
f.write("1:aaa")
with self.assertRaises(ValueError):
torrent.TorrentParser('./tests/data/clean.torrent')
os.remove('./tests/data/clean.torrent')
def test_odg(self):
@@ -65,23 +162,110 @@ class TestCorruptedFiles(unittest.TestCase):
def test_bmp(self):
shutil.copy('./tests/data/dirty.png', './tests/data/clean.bmp')
harmless.HarmlessParser('./tests/data/clean.bmp')
ret = harmless.HarmlessParser('./tests/data/clean.bmp')
self.assertIsNotNone(ret)
os.remove('./tests/data/clean.bmp')
def test_docx(self):
shutil.copy('./tests/data/dirty.png', './tests/data/clean.docx')
with self.assertRaises(ValueError):
office.MSOfficeParser('./tests/data/clean.docx')
office.MSOfficeParser('./tests/data/clean.docx')
os.remove('./tests/data/clean.docx')
def test_flac(self):
shutil.copy('./tests/data/dirty.png', './tests/data/clean.flac')
with self.assertRaises(ValueError):
audio.FLACParser('./tests/data/clean.flac')
audio.FLACParser('./tests/data/clean.flac')
os.remove('./tests/data/clean.flac')
def test_mp3(self):
shutil.copy('./tests/data/dirty.png', './tests/data/clean.mp3')
with self.assertRaises(ValueError):
audio.MP3Parser('./tests/data/clean.mp3')
audio.MP3Parser('./tests/data/clean.mp3')
os.remove('./tests/data/clean.mp3')
def test_jpg(self):
shutil.copy('./tests/data/dirty.mp3', './tests/data/clean.jpg')
with self.assertRaises(ValueError):
images.JPGParser('./tests/data/clean.jpg')
os.remove('./tests/data/clean.jpg')
def test_png_lightweight(self):
return
shutil.copy('./tests/data/dirty.torrent', './tests/data/clean.png')
p = images.PNGParser('./tests/data/clean.png')
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.png')
def test_avi(self):
try:
video._get_ffmpeg_path()
except RuntimeError:
raise unittest.SkipTest
shutil.copy('./tests/data/dirty.torrent', './tests/data/clean.avi')
p = video.AVIParser('./tests/data/clean.avi')
self.assertFalse(p.remove_all())
os.remove('./tests/data/clean.avi')
def test_avi_injection(self):
try:
video._get_ffmpeg_path()
except RuntimeError:
raise unittest.SkipTest
shutil.copy('./tests/data/dirty.torrent', './tests/data/--output.avi')
p = video.AVIParser('./tests/data/--output.avi')
self.assertFalse(p.remove_all())
os.remove('./tests/data/--output.avi')
def test_zip(self):
with zipfile.ZipFile('./tests/data/dirty.zip', 'w') as zout:
zout.write('./tests/data/dirty.flac')
zout.write('./tests/data/dirty.docx')
zout.write('./tests/data/dirty.jpg')
zout.write('./tests/data/embedded_corrupted.docx')
p, mimetype = parser_factory.get_parser('./tests/data/dirty.zip')
self.assertEqual(mimetype, 'application/zip')
meta = p.get_meta()
self.assertEqual(meta['tests/data/dirty.flac']['comments'], 'Thank you for using MAT !')
self.assertEqual(meta['tests/data/dirty.docx']['word/media/image1.png']['Comment'], 'This is a comment, be careful!')
self.assertFalse(p.remove_all())
os.remove('./tests/data/dirty.zip')
def test_html(self):
shutil.copy('./tests/data/dirty.html', './tests/data/clean.html')
with open('./tests/data/clean.html', 'a') as f:
f.write('<open>but not</closed>')
with self.assertRaises(ValueError):
html.HTMLParser('./tests/data/clean.html')
os.remove('./tests/data/clean.html')
# Yes, we're able to deal with malformed html :/
shutil.copy('./tests/data/dirty.html', './tests/data/clean.html')
with open('./tests/data/clean.html', 'a') as f:
f.write('<meta name=\'this" is="weird"/>')
p = html.HTMLParser('./tests/data/clean.html')
self.assertTrue(p.remove_all())
p = html.HTMLParser('./tests/data/clean.cleaned.html')
self.assertEqual(p.get_meta(), {})
os.remove('./tests/data/clean.html')
os.remove('./tests/data/clean.cleaned.html')
with open('./tests/data/clean.html', 'w') as f:
f.write('</close>')
with self.assertRaises(ValueError):
html.HTMLParser('./tests/data/clean.html')
os.remove('./tests/data/clean.html')
with open('./tests/data/clean.html', 'w') as f:
f.write('<notclosed>')
p = html.HTMLParser('./tests/data/clean.html')
with self.assertRaises(ValueError):
p.get_meta()
p = html.HTMLParser('./tests/data/clean.html')
with self.assertRaises(ValueError):
p.remove_all()
os.remove('./tests/data/clean.html')

135
tests/test_deep_cleaning.py Normal file
View File

@@ -0,0 +1,135 @@
#!/usr/bin/env python3
import unittest
import shutil
import os
import zipfile
import tempfile
from libmat2 import office, parser_factory
class TestZipMetadata(unittest.TestCase):
def __check_deep_meta(self, p):
tempdir = tempfile.mkdtemp()
zipin = zipfile.ZipFile(p.filename)
zipin.extractall(tempdir)
for subdir, dirs, files in os.walk(tempdir):
for f in files:
complete_path = os.path.join(subdir, f)
inside_p, _ = parser_factory.get_parser(complete_path)
if inside_p is None:
continue
self.assertEqual(inside_p.get_meta(), {})
shutil.rmtree(tempdir)
def __check_zip_meta(self, p):
zipin = zipfile.ZipFile(p.filename)
for item in zipin.infolist():
self.assertEqual(item.comment, b'')
self.assertEqual(item.date_time, (1980, 1, 1, 0, 0, 0))
self.assertEqual(item.create_system, 3) # 3 is UNIX
def test_office(self):
shutil.copy('./tests/data/dirty.docx', './tests/data/clean.docx')
p = office.MSOfficeParser('./tests/data/clean.docx')
meta = p.get_meta()
self.assertIsNotNone(meta)
self.assertEqual(meta['word/media/image1.png']['Comment'], 'This is a comment, be careful!')
ret = p.remove_all()
self.assertTrue(ret)
p = office.MSOfficeParser('./tests/data/clean.cleaned.docx')
self.assertEqual(p.get_meta(), {})
self.__check_zip_meta(p)
self.__check_deep_meta(p)
os.remove('./tests/data/clean.docx')
os.remove('./tests/data/clean.cleaned.docx')
def test_libreoffice(self):
shutil.copy('./tests/data/dirty.odt', './tests/data/clean.odt')
p = office.LibreOfficeParser('./tests/data/clean.odt')
meta = p.get_meta()
self.assertIsNotNone(meta)
ret = p.remove_all()
self.assertTrue(ret)
p = office.LibreOfficeParser('./tests/data/clean.cleaned.odt')
self.assertEqual(p.get_meta(), {})
self.__check_zip_meta(p)
self.__check_deep_meta(p)
os.remove('./tests/data/clean.odt')
os.remove('./tests/data/clean.cleaned.odt')
class TestZipOrder(unittest.TestCase):
def test_libreoffice(self):
shutil.copy('./tests/data/dirty.odt', './tests/data/clean.odt')
p = office.LibreOfficeParser('./tests/data/clean.odt')
meta = p.get_meta()
self.assertIsNotNone(meta)
is_unordered = False
with zipfile.ZipFile('./tests/data/clean.odt') as zin:
previous_name = ''
for item in zin.infolist():
if previous_name == '':
previous_name = item.filename
continue
elif item.filename < previous_name:
is_unordered = True
break
self.assertTrue(is_unordered)
ret = p.remove_all()
self.assertTrue(ret)
with zipfile.ZipFile('./tests/data/clean.cleaned.odt') as zin:
previous_name = ''
for item in zin.infolist():
if previous_name == '':
previous_name = item.filename
continue
self.assertGreaterEqual(item.filename, previous_name)
os.remove('./tests/data/clean.odt')
os.remove('./tests/data/clean.cleaned.odt')
class TestRsidRemoval(unittest.TestCase):
def test_office(self):
shutil.copy('./tests/data/office_revision_session_ids.docx', './tests/data/clean.docx')
p = office.MSOfficeParser('./tests/data/clean.docx')
meta = p.get_meta()
self.assertIsNotNone(meta)
how_many_rsid = False
with zipfile.ZipFile('./tests/data/clean.docx') as zin:
for item in zin.infolist():
if not item.filename.endswith('.xml'):
continue
num = zin.read(item).decode('utf-8').lower().count('w:rsid')
how_many_rsid += num
self.assertEqual(how_many_rsid, 11)
ret = p.remove_all()
self.assertTrue(ret)
with zipfile.ZipFile('./tests/data/clean.cleaned.docx') as zin:
for item in zin.infolist():
if not item.filename.endswith('.xml'):
continue
num = zin.read(item).decode('utf-8').lower().count('w:rsid')
self.assertEqual(num, 0)
os.remove('./tests/data/clean.docx')
os.remove('./tests/data/clean.cleaned.docx')

View File

@@ -1,12 +1,23 @@
#!/usr/bin/python3
#!/usr/bin/env python3
import unittest
import shutil
import os
import zipfile
import tempfile
from libmat2 import pdf, images, audio, office, parser_factory, torrent, harmless
from libmat2 import check_dependencies, video, archive, html
class TestCheckDependencies(unittest.TestCase):
def test_deps(self):
try:
ret = check_dependencies()
except RuntimeError:
return # this happens if not every dependency is installed
for value in ret.values():
self.assertTrue(value)
class TestParserFactory(unittest.TestCase):
@@ -26,6 +37,32 @@ class TestParameterInjection(unittest.TestCase):
self.assertEqual(meta['ModifyDate'], "2018:03:20 21:59:25")
os.remove('-ver')
def test_ffmpeg_injection(self):
try:
video._get_ffmpeg_path()
except RuntimeError:
raise unittest.SkipTest
shutil.copy('./tests/data/dirty.avi', './--output')
p = video.AVIParser('--output')
meta = p.get_meta()
self.assertEqual(meta['Software'], 'MEncoder SVN-r33148-4.0.1')
os.remove('--output')
def test_ffmpeg_injection_complete_path(self):
try:
video._get_ffmpeg_path()
except RuntimeError:
raise unittest.SkipTest
shutil.copy('./tests/data/dirty.avi', './tests/data/ --output.avi')
p = video.AVIParser('./tests/data/ --output.avi')
meta = p.get_meta()
self.assertEqual(meta['Software'], 'MEncoder SVN-r33148-4.0.1')
self.assertTrue(p.remove_all())
os.remove('./tests/data/ --output.avi')
os.remove('./tests/data/ --output.cleaned.avi')
class TestUnsupportedEmbeddedFiles(unittest.TestCase):
def test_odt_with_svg(self):
@@ -48,8 +85,8 @@ class TestGetMeta(unittest.TestCase):
self.assertEqual(meta['producer'], 'pdfTeX-1.40.14')
self.assertEqual(meta['creator'], "'Certified by IEEE PDFeXpress at 03/19/2016 2:56:07 AM'")
self.assertEqual(meta['DocumentID'], "uuid:4a1a79c8-404e-4d38-9580-5bc081036e61")
self.assertEqual(meta['PTEX.Fullbanner'], "This is pdfTeX, Version " \
"3.1415926-2.5-1.40.14 (TeX Live 2013/Debian) kpathsea " \
self.assertEqual(meta['PTEX.Fullbanner'], "This is pdfTeX, Version "
"3.1415926-2.5-1.40.14 (TeX Live 2013/Debian) kpathsea "
"version 6.1.1")
def test_torrent(self):
@@ -89,20 +126,26 @@ class TestGetMeta(unittest.TestCase):
p = audio.FLACParser('./tests/data/dirty.flac')
meta = p.get_meta()
self.assertEqual(meta['title'], 'I am so')
self.assertEqual(meta['Cover 0'], {'Comment': 'Created with GIMP'})
def test_docx(self):
p = office.MSOfficeParser('./tests/data/dirty.docx')
meta = p.get_meta()
self.assertEqual(meta['cp:lastModifiedBy'], 'Julien Voisin')
self.assertEqual(meta['dc:creator'], 'julien voisin')
self.assertEqual(meta['Application'], 'LibreOffice/5.4.5.1$Linux_X86_64 LibreOffice_project/40m0$Build-1')
self.assertEqual(meta['docProps/core.xml']['cp:lastModifiedBy'], 'Julien Voisin')
self.assertEqual(meta['docProps/core.xml']['dc:creator'], 'julien voisin')
self.assertEqual(meta['docProps/app.xml']['Application'], 'LibreOffice/5.4.5.1$Linux_X86_64 LibreOffice_project/40m0$Build-1')
def test_libreoffice(self):
p = office.LibreOfficeParser('./tests/data/dirty.odt')
meta = p.get_meta()
self.assertEqual(meta['meta:initial-creator'], 'jvoisin ')
self.assertEqual(meta['meta:creation-date'], '2011-07-26T03:27:48')
self.assertEqual(meta['meta:generator'], 'LibreOffice/3.3$Unix LibreOffice_project/330m19$Build-202')
self.assertEqual(meta['meta.xml']['meta:initial-creator'], 'jvoisin ')
self.assertEqual(meta['meta.xml']['meta:creation-date'], '2011-07-26T03:27:48')
self.assertEqual(meta['meta.xml']['meta:generator'], 'LibreOffice/3.3$Unix LibreOffice_project/330m19$Build-202')
p = office.LibreOfficeParser('./tests/data/weird_producer.odt')
meta = p.get_meta()
self.assertEqual(meta['mimetype']['create_system'], 'Windows')
self.assertEqual(meta['mimetype']['comment'], b'YAY FOR COMMENTS')
def test_txt(self):
p, mimetype = parser_factory.get_parser('./tests/data/dirty.txt')
@@ -110,6 +153,29 @@ class TestGetMeta(unittest.TestCase):
meta = p.get_meta()
self.assertEqual(meta, {})
def test_zip(self):
with zipfile.ZipFile('./tests/data/dirty.zip', 'w') as zout:
zout.write('./tests/data/dirty.flac')
zout.write('./tests/data/dirty.docx')
zout.write('./tests/data/dirty.jpg')
p, mimetype = parser_factory.get_parser('./tests/data/dirty.zip')
self.assertEqual(mimetype, 'application/zip')
meta = p.get_meta()
self.assertEqual(meta['tests/data/dirty.flac']['comments'], 'Thank you for using MAT !')
self.assertEqual(meta['tests/data/dirty.docx']['word/media/image1.png']['Comment'], 'This is a comment, be careful!')
os.remove('./tests/data/dirty.zip')
def test_wmv(self):
p, mimetype = parser_factory.get_parser('./tests/data/dirty.wmv')
self.assertEqual(mimetype, 'video/x-ms-wmv')
meta = p.get_meta()
self.assertEqual(meta['EncodingSettings'], 'Lavf52.103.0')
def test_gif(self):
p, mimetype = parser_factory.get_parser('./tests/data/dirty.gif')
self.assertEqual(mimetype, 'image/gif')
meta = p.get_meta()
self.assertEqual(meta['Comment'], 'this is a test comment')
class TestRemovingThumbnails(unittest.TestCase):
def test_odt(self):
@@ -169,105 +235,6 @@ class TestRevisionsCleaning(unittest.TestCase):
os.remove('./tests/data/revision_clean.docx')
os.remove('./tests/data/revision_clean.cleaned.docx')
class TestDeepCleaning(unittest.TestCase):
def __check_deep_meta(self, p):
tempdir = tempfile.mkdtemp()
zipin = zipfile.ZipFile(p.filename)
zipin.extractall(tempdir)
for subdir, dirs, files in os.walk(tempdir):
for f in files:
complete_path = os.path.join(subdir, f)
inside_p, _ = parser_factory.get_parser(complete_path)
if inside_p is None:
continue
print('[+] %s is clean inside %s' %(complete_path, p.filename))
self.assertEqual(inside_p.get_meta(), {})
shutil.rmtree(tempdir)
def __check_zip_meta(self, p):
zipin = zipfile.ZipFile(p.filename)
for item in zipin.infolist():
self.assertEqual(item.comment, b'')
self.assertEqual(item.date_time, (1980, 1, 1, 0, 0, 0))
self.assertEqual(item.create_system, 3) # 3 is UNIX
def test_office(self):
shutil.copy('./tests/data/dirty.docx', './tests/data/clean.docx')
p = office.MSOfficeParser('./tests/data/clean.docx')
meta = p.get_meta()
self.assertIsNotNone(meta)
ret = p.remove_all()
self.assertTrue(ret)
p = office.MSOfficeParser('./tests/data/clean.cleaned.docx')
self.assertEqual(p.get_meta(), {})
self.__check_zip_meta(p)
self.__check_deep_meta(p)
os.remove('./tests/data/clean.docx')
os.remove('./tests/data/clean.cleaned.docx')
def test_libreoffice(self):
shutil.copy('./tests/data/dirty.odt', './tests/data/clean.odt')
p = office.LibreOfficeParser('./tests/data/clean.odt')
meta = p.get_meta()
self.assertIsNotNone(meta)
ret = p.remove_all()
self.assertTrue(ret)
p = office.LibreOfficeParser('./tests/data/clean.cleaned.odt')
self.assertEqual(p.get_meta(), {})
self.__check_zip_meta(p)
self.__check_deep_meta(p)
os.remove('./tests/data/clean.odt')
os.remove('./tests/data/clean.cleaned.odt')
class TestLightWeightCleaning(unittest.TestCase):
def test_pdf(self):
shutil.copy('./tests/data/dirty.pdf', './tests/data/clean.pdf')
p = pdf.PDFParser('./tests/data/clean.pdf')
meta = p.get_meta()
self.assertEqual(meta['producer'], 'pdfTeX-1.40.14')
ret = p.remove_all_lightweight()
self.assertTrue(ret)
p = pdf.PDFParser('./tests/data/clean.cleaned.pdf')
expected_meta = {'creation-date': -1, 'format': 'PDF-1.5', 'mod-date': -1}
self.assertEqual(p.get_meta(), expected_meta)
os.remove('./tests/data/clean.pdf')
os.remove('./tests/data/clean.cleaned.pdf')
def test_png(self):
shutil.copy('./tests/data/dirty.png', './tests/data/clean.png')
p = images.PNGParser('./tests/data/clean.png')
meta = p.get_meta()
self.assertEqual(meta['Comment'], 'This is a comment, be careful!')
ret = p.remove_all_lightweight()
self.assertTrue(ret)
p = images.PNGParser('./tests/data/clean.cleaned.png')
self.assertEqual(p.get_meta(), {})
os.remove('./tests/data/clean.png')
os.remove('./tests/data/clean.cleaned.png')
class TestCleaning(unittest.TestCase):
def test_pdf(self):
shutil.copy('./tests/data/dirty.pdf', './tests/data/clean.pdf')
@@ -282,9 +249,11 @@ class TestCleaning(unittest.TestCase):
p = pdf.PDFParser('./tests/data/clean.cleaned.pdf')
expected_meta = {'creation-date': -1, 'format': 'PDF-1.5', 'mod-date': -1}
self.assertEqual(p.get_meta(), expected_meta)
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.pdf')
os.remove('./tests/data/clean.cleaned.pdf')
os.remove('./tests/data/clean.cleaned.cleaned.pdf')
def test_png(self):
shutil.copy('./tests/data/dirty.png', './tests/data/clean.png')
@@ -298,9 +267,11 @@ class TestCleaning(unittest.TestCase):
p = images.PNGParser('./tests/data/clean.cleaned.png')
self.assertEqual(p.get_meta(), {})
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.png')
os.remove('./tests/data/clean.cleaned.png')
os.remove('./tests/data/clean.cleaned.cleaned.png')
def test_jpg(self):
shutil.copy('./tests/data/dirty.jpg', './tests/data/clean.jpg')
@@ -314,9 +285,11 @@ class TestCleaning(unittest.TestCase):
p = images.JPGParser('./tests/data/clean.cleaned.jpg')
self.assertEqual(p.get_meta(), {})
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.jpg')
os.remove('./tests/data/clean.cleaned.jpg')
os.remove('./tests/data/clean.cleaned.cleaned.jpg')
def test_mp3(self):
shutil.copy('./tests/data/dirty.mp3', './tests/data/clean.mp3')
@@ -330,9 +303,11 @@ class TestCleaning(unittest.TestCase):
p = audio.MP3Parser('./tests/data/clean.cleaned.mp3')
self.assertEqual(p.get_meta(), {})
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.mp3')
os.remove('./tests/data/clean.cleaned.mp3')
os.remove('./tests/data/clean.cleaned.cleaned.mp3')
def test_ogg(self):
shutil.copy('./tests/data/dirty.ogg', './tests/data/clean.ogg')
@@ -346,9 +321,11 @@ class TestCleaning(unittest.TestCase):
p = audio.OGGParser('./tests/data/clean.cleaned.ogg')
self.assertEqual(p.get_meta(), {})
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.ogg')
os.remove('./tests/data/clean.cleaned.ogg')
os.remove('./tests/data/clean.cleaned.cleaned.ogg')
def test_flac(self):
shutil.copy('./tests/data/dirty.flac', './tests/data/clean.flac')
@@ -362,9 +339,11 @@ class TestCleaning(unittest.TestCase):
p = audio.FLACParser('./tests/data/clean.cleaned.flac')
self.assertEqual(p.get_meta(), {})
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.flac')
os.remove('./tests/data/clean.cleaned.flac')
os.remove('./tests/data/clean.cleaned.cleaned.flac')
def test_office(self):
shutil.copy('./tests/data/dirty.docx', './tests/data/clean.docx')
@@ -378,10 +357,11 @@ class TestCleaning(unittest.TestCase):
p = office.MSOfficeParser('./tests/data/clean.cleaned.docx')
self.assertEqual(p.get_meta(), {})
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.docx')
os.remove('./tests/data/clean.cleaned.docx')
os.remove('./tests/data/clean.cleaned.cleaned.docx')
def test_libreoffice(self):
shutil.copy('./tests/data/dirty.odt', './tests/data/clean.odt')
@@ -395,9 +375,11 @@ class TestCleaning(unittest.TestCase):
p = office.LibreOfficeParser('./tests/data/clean.cleaned.odt')
self.assertEqual(p.get_meta(), {})
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.odt')
os.remove('./tests/data/clean.cleaned.odt')
os.remove('./tests/data/clean.cleaned.cleaned.odt')
def test_tiff(self):
shutil.copy('./tests/data/dirty.tiff', './tests/data/clean.tiff')
@@ -411,9 +393,11 @@ class TestCleaning(unittest.TestCase):
p = images.TiffParser('./tests/data/clean.cleaned.tiff')
self.assertEqual(p.get_meta(), {})
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.tiff')
os.remove('./tests/data/clean.cleaned.tiff')
os.remove('./tests/data/clean.cleaned.cleaned.tiff')
def test_bmp(self):
shutil.copy('./tests/data/dirty.bmp', './tests/data/clean.bmp')
@@ -427,9 +411,11 @@ class TestCleaning(unittest.TestCase):
p = harmless.HarmlessParser('./tests/data/clean.cleaned.bmp')
self.assertEqual(p.get_meta(), {})
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.bmp')
os.remove('./tests/data/clean.cleaned.bmp')
os.remove('./tests/data/clean.cleaned.cleaned.bmp')
def test_torrent(self):
shutil.copy('./tests/data/dirty.torrent', './tests/data/clean.torrent')
@@ -443,42 +429,47 @@ class TestCleaning(unittest.TestCase):
p = torrent.TorrentParser('./tests/data/clean.cleaned.torrent')
self.assertEqual(p.get_meta(), {})
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.torrent')
os.remove('./tests/data/clean.cleaned.torrent')
os.remove('./tests/data/clean.cleaned.cleaned.torrent')
def test_odf(self):
shutil.copy('./tests/data/dirty.odf', './tests/data/clean.odf')
p = office.LibreOfficeParser('./tests/data/clean.odf')
meta = p.get_meta()
self.assertEqual(meta['meta:creation-date'], '2018-04-23T00:18:59.438231281')
self.assertEqual(meta['meta.xml']['meta:creation-date'], '2018-04-23T00:18:59.438231281')
ret = p.remove_all()
self.assertTrue(ret)
p = office.LibreOfficeParser('./tests/data/clean.cleaned.odf')
self.assertEqual(p.get_meta(), {})
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.odf')
os.remove('./tests/data/clean.cleaned.odf')
os.remove('./tests/data/clean.cleaned.cleaned.odf')
def test_odg(self):
shutil.copy('./tests/data/dirty.odg', './tests/data/clean.odg')
p = office.LibreOfficeParser('./tests/data/clean.odg')
meta = p.get_meta()
self.assertEqual(meta['dc:date'], '2018-04-23T00:26:59.385838550')
self.assertEqual(meta['meta.xml']['dc:date'], '2018-04-23T00:26:59.385838550')
ret = p.remove_all()
self.assertTrue(ret)
p = office.LibreOfficeParser('./tests/data/clean.cleaned.odg')
self.assertEqual(p.get_meta(), {})
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.odg')
os.remove('./tests/data/clean.cleaned.odg')
os.remove('./tests/data/clean.cleaned.cleaned.odg')
def test_txt(self):
shutil.copy('./tests/data/dirty.txt', './tests/data/clean.txt')
@@ -492,6 +483,134 @@ class TestCleaning(unittest.TestCase):
p = harmless.HarmlessParser('./tests/data/clean.cleaned.txt')
self.assertEqual(p.get_meta(), {})
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.txt')
os.remove('./tests/data/clean.cleaned.txt')
os.remove('./tests/data/clean.cleaned.cleaned.txt')
def test_avi(self):
try:
video._get_ffmpeg_path()
except RuntimeError:
raise unittest.SkipTest
shutil.copy('./tests/data/dirty.avi', './tests/data/clean.avi')
p = video.AVIParser('./tests/data/clean.avi')
meta = p.get_meta()
self.assertEqual(meta['Software'], 'MEncoder SVN-r33148-4.0.1')
ret = p.remove_all()
self.assertTrue(ret)
p = video.AVIParser('./tests/data/clean.cleaned.avi')
self.assertEqual(p.get_meta(), {})
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.avi')
os.remove('./tests/data/clean.cleaned.avi')
os.remove('./tests/data/clean.cleaned.cleaned.avi')
def test_zip(self):
with zipfile.ZipFile('./tests/data/dirty.zip', 'w') as zout:
zout.write('./tests/data/dirty.flac')
zout.write('./tests/data/dirty.docx')
zout.write('./tests/data/dirty.jpg')
p = archive.ZipParser('./tests/data/dirty.zip')
meta = p.get_meta()
self.assertEqual(meta['tests/data/dirty.docx']['word/media/image1.png']['Comment'], 'This is a comment, be careful!')
ret = p.remove_all()
self.assertTrue(ret)
p = archive.ZipParser('./tests/data/dirty.cleaned.zip')
self.assertEqual(p.get_meta(), {})
self.assertTrue(p.remove_all())
os.remove('./tests/data/dirty.zip')
os.remove('./tests/data/dirty.cleaned.zip')
os.remove('./tests/data/dirty.cleaned.cleaned.zip')
def test_mp4(self):
try:
video._get_ffmpeg_path()
except RuntimeError:
raise unittest.SkipTest
shutil.copy('./tests/data/dirty.mp4', './tests/data/clean.mp4')
p = video.MP4Parser('./tests/data/clean.mp4')
meta = p.get_meta()
self.assertEqual(meta['Encoder'], 'HandBrake 0.9.4 2009112300')
ret = p.remove_all()
self.assertTrue(ret)
p = video.MP4Parser('./tests/data/clean.cleaned.mp4')
self.assertNotIn('Encoder', p.get_meta())
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.mp4')
os.remove('./tests/data/clean.cleaned.mp4')
os.remove('./tests/data/clean.cleaned.cleaned.mp4')
def test_wmv(self):
try:
video._get_ffmpeg_path()
except RuntimeError:
raise unittest.SkipTest
shutil.copy('./tests/data/dirty.wmv', './tests/data/clean.wmv')
p = video.WMVParser('./tests/data/clean.wmv')
meta = p.get_meta()
self.assertEqual(meta['EncodingSettings'], 'Lavf52.103.0')
ret = p.remove_all()
self.assertTrue(ret)
p = video.WMVParser('./tests/data/clean.cleaned.wmv')
self.assertNotIn('EncodingSettings', p.get_meta())
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.wmv')
os.remove('./tests/data/clean.cleaned.wmv')
os.remove('./tests/data/clean.cleaned.cleaned.wmv')
def test_gif(self):
shutil.copy('./tests/data/dirty.gif', './tests/data/clean.gif')
p = images.GIFParser('./tests/data/clean.gif')
meta = p.get_meta()
self.assertEqual(meta['Comment'], 'this is a test comment')
ret = p.remove_all()
self.assertTrue(ret)
p = images.GIFParser('./tests/data/clean.cleaned.gif')
self.assertNotIn('EncodingSettings', p.get_meta())
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.gif')
os.remove('./tests/data/clean.cleaned.gif')
os.remove('./tests/data/clean.cleaned.cleaned.gif')
def test_html(self):
shutil.copy('./tests/data/dirty.html', './tests/data/clean.html')
p = html.HTMLParser('./tests/data/clean.html')
meta = p.get_meta()
self.assertEqual(meta['author'], 'jvoisin')
ret = p.remove_all()
self.assertTrue(ret)
p = html.HTMLParser('./tests/data/clean.cleaned.html')
self.assertEqual(p.get_meta(), {})
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.html')
os.remove('./tests/data/clean.cleaned.html')
os.remove('./tests/data/clean.cleaned.cleaned.html')

View File

@@ -0,0 +1,106 @@
#!/usr/bin/env python3
import unittest
import shutil
import os
from libmat2 import pdf, images, torrent
class TestLightWeightCleaning(unittest.TestCase):
def test_pdf(self):
shutil.copy('./tests/data/dirty.pdf', './tests/data/clean.pdf')
p = pdf.PDFParser('./tests/data/clean.pdf')
meta = p.get_meta()
self.assertEqual(meta['producer'], 'pdfTeX-1.40.14')
p.lightweight_cleaning = True
ret = p.remove_all()
self.assertTrue(ret)
p = pdf.PDFParser('./tests/data/clean.cleaned.pdf')
expected_meta = {'creation-date': -1, 'format': 'PDF-1.5', 'mod-date': -1}
self.assertEqual(p.get_meta(), expected_meta)
os.remove('./tests/data/clean.pdf')
os.remove('./tests/data/clean.cleaned.pdf')
def test_png(self):
shutil.copy('./tests/data/dirty.png', './tests/data/clean.png')
p = images.PNGParser('./tests/data/clean.png')
meta = p.get_meta()
self.assertEqual(meta['Comment'], 'This is a comment, be careful!')
p.lightweight_cleaning = True
ret = p.remove_all()
self.assertTrue(ret)
p = images.PNGParser('./tests/data/clean.cleaned.png')
self.assertEqual(p.get_meta(), {})
p = images.PNGParser('./tests/data/clean.png')
p.lightweight_cleaning = True
ret = p.remove_all()
self.assertTrue(ret)
os.remove('./tests/data/clean.png')
os.remove('./tests/data/clean.cleaned.png')
def test_jpg(self):
shutil.copy('./tests/data/dirty.jpg', './tests/data/clean.jpg')
p = images.JPGParser('./tests/data/clean.jpg')
meta = p.get_meta()
self.assertEqual(meta['Comment'], 'Created with GIMP')
p.lightweight_cleaning = True
ret = p.remove_all()
self.assertTrue(ret)
p = images.JPGParser('./tests/data/clean.cleaned.jpg')
self.assertEqual(p.get_meta(), {})
os.remove('./tests/data/clean.jpg')
os.remove('./tests/data/clean.cleaned.jpg')
def test_torrent(self):
shutil.copy('./tests/data/dirty.torrent', './tests/data/clean.torrent')
p = torrent.TorrentParser('./tests/data/clean.torrent')
meta = p.get_meta()
self.assertEqual(meta['created by'], b'mktorrent 1.0')
p.lightweight_cleaning = True
ret = p.remove_all()
self.assertTrue(ret)
p = torrent.TorrentParser('./tests/data/clean.cleaned.torrent')
self.assertEqual(p.get_meta(), {})
os.remove('./tests/data/clean.torrent')
os.remove('./tests/data/clean.cleaned.torrent')
def test_tiff(self):
shutil.copy('./tests/data/dirty.tiff', './tests/data/clean.tiff')
p = images.TiffParser('./tests/data/clean.tiff')
meta = p.get_meta()
self.assertEqual(meta['ImageDescription'], 'OLYMPUS DIGITAL CAMERA ')
p.lightweight_cleaning = True
ret = p.remove_all()
self.assertTrue(ret)
p = images.TiffParser('./tests/data/clean.cleaned.tiff')
self.assertEqual(p.get_meta(),
{
'Orientation': 'Horizontal (normal)',
'ResolutionUnit': 'inches',
'XResolution': 72,
'YResolution': 72
}
)
os.remove('./tests/data/clean.tiff')
os.remove('./tests/data/clean.cleaned.tiff')

31
tests/test_policy.py Normal file
View File

@@ -0,0 +1,31 @@
#!/usr/bin/env python3
import unittest
import shutil
import os
from libmat2 import office, UnknownMemberPolicy
class TestPolicy(unittest.TestCase):
def test_policy_omit(self):
shutil.copy('./tests/data/embedded.docx', './tests/data/clean.docx')
p = office.MSOfficeParser('./tests/data/clean.docx')
p.unknown_member_policy = UnknownMemberPolicy.OMIT
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.docx')
os.remove('./tests/data/clean.cleaned.docx')
def test_policy_keep(self):
shutil.copy('./tests/data/embedded.docx', './tests/data/clean.docx')
p = office.MSOfficeParser('./tests/data/clean.docx')
p.unknown_member_policy = UnknownMemberPolicy.KEEP
self.assertTrue(p.remove_all())
os.remove('./tests/data/clean.docx')
os.remove('./tests/data/clean.cleaned.docx')
def test_policy_unknown(self):
shutil.copy('./tests/data/embedded.docx', './tests/data/clean.docx')
p = office.MSOfficeParser('./tests/data/clean.docx')
with self.assertRaises(ValueError):
p.unknown_member_policy = UnknownMemberPolicy('unknown_policy_name_totally_invalid')
os.remove('./tests/data/clean.docx')