2025-10-06 16:42:57 +02:00 · 2019-02-17 17:02:17 +01:00 · 2019-02-10 21:46:13 +01:00 · 2019-02-09 15:23:16 +01:00 · 2019-02-09 09:49:51 +01:00 · 2019-02-08 23:23:56 +01:00
51 changed files with 3231 additions and 591 deletions
							
							
								
								jvoisin
							
						
6b45064c78
						
							
							Bump the changelog
						
						
2019-02-17 17:02:17 +01:00
							
							
								
								jvoisin
							
						
a81b7658a8
						
							
							Make the mandatory metadata warning generic
						
						
...
This should close #95.
2019-02-10 21:46:13 +01:00
							
							
								
								jvoisin
							
						
6e63e03b86
						
							
							Streamline a bit the previous commit
						
						
2019-02-09 15:23:16 +01:00
							
							
								
								Poncho
							
						
a71488d459
						
							
							bind mount /etc/ld.so.cache to the sandbox
						
						
...
without /etc/ld.so.cache available in the sandbox, tests fail on gentoo with:
/usr/bin/ffmpeg: error while loading shared libraries: libstdc++.so.6:
    cannot open shared object file: No such file or directory
2019-02-09 09:49:51 +01:00
							
							
								
								jvoisin
							
						
6ef6aaa222
						
							
							Improve a bit get_meta for libreoffice files
						
						
2019-02-08 23:23:56 +01:00
							
							
								
								jvoisin
							
						
6cc034e81b
						
							
							Add support for html files
						
						
2019-02-08 23:05:18 +01:00
							
							
								
								jvoisin
							
						
e1dd439fc8
						
							
							Use of the archive refactoring for the office documents too
						
						
2019-02-07 22:19:37 +01:00
							
							
								
								jvoisin
							
						
b9a62d798a
						
							
							Refactor a bit office get_meta handling
						
						
...
This should make easier to get more metadata from
archive-based file formats.
2019-02-04 00:31:26 +01:00
							
							
								
								jvoisin
							
						
54e50450ad
						
							
							Fix the return code on parsers' list display
						
						
2019-02-03 21:09:12 +01:00
							
							
								
								jvoisin
							
						
433609f8ea
						
							
							Implement .gif support
						
						
2019-02-03 21:01:58 +01:00
							
							
								
								intrigeri
							
						
e8c1bb0e3c
						
							
							Whenever possible, use bwrap for subprocesses
						
						
...
This should closes  #90
2019-02-03 19:18:41 +01:00
							
							
								
								jvoisin
							
						
8b5d0c286c
						
							
							Document how to get the coverage from the testsuite
						
						
2019-02-03 18:33:25 +01:00
							
							
								
								jvoisin
							
						
8e84ba547a
						
							
							Add support for wmv
						
						
2019-02-02 19:19:36 +01:00
							
							
								
								jvoisin
							
						
812bf2553b
						
							
							Rename the internal class used by the nautilus extension
						
						
...
This should solve collisions with people like me that
are copy/pasting the documentation, creating conflicts
with other extensions that are doing the very same thing.
2019-01-16 23:10:17 +01:00
							
							
								
								Alan
							
						
94cdca1ed2
						
							
							Update debian packaging status
						
						
2018-12-15 17:05:37 +01:00
							
							
								
								Alan
							
						
b755aba8ea
						
							
							Fix debian build instructions
						
						
2018-12-15 17:05:32 +01:00
							
							
								
								jvoisin
							
						
edce78859b
						
							
							Add a note in the readme about -L and pdf
						
						
2018-12-08 18:39:56 +01:00
							
							
								
								jvoisin
							
						
0ab17b973b
						
							
							mat2 is now available on pypi
						
						
2018-11-11 20:49:24 +01:00
							
							
								
								jvoisin
							
						
389311475c
						
							
							Add a readme for the nautilus extension
						
						
2018-11-11 19:58:51 +01:00
							
							
								
								jvoisin
							
						
505be24be9
						
							
							Bump the changelog
						
						
2018-11-10 12:46:31 +01:00
							
							
								
								jvoisin
							
						
ef8265e86a
						
							
							Remove a useless image
						
						
2018-11-10 10:54:13 +01:00
							
							
								
								jvoisin
							
						
1d75451b77
						
							
							Add some type annotations to the nautilus extension
						
						
2018-11-08 21:40:33 +01:00
							
							
								
								jvoisin
							
						
dc35ef56c8
						
							
							Add a missing file :/
						
						
2018-11-07 22:20:31 +01:00
							
							
								
								jvoisin
							
						
3aa76cc58e
						
							
							Prove that the previous commit is working
						
						
2018-11-07 22:13:36 +01:00
							
							
								
								jvoisin
							
						
8ff57c5803
						
							
							Do not display control characters in output
						
						
...
Kudos to Sherry Taylor for reporting this issue ♥
2018-11-07 22:07:46 +01:00
							
							
								
								jvoisin
							
						
04bb8c8ccf
						
							
							Add mp4 support
						
						
2018-10-28 07:41:04 -07:00
							
							
								
								jvoisin
							
						
3a070b0ab7
						
							
							Add support for zip files
						
						
2018-10-25 11:56:46 +02:00
							
							
								
								jvoisin
							
						
283e5e5787
						
							
							Improve archive-based parser's robustness against corrupted embedded files
						
						
2018-10-25 11:56:12 +02:00
							
							
								
								jvoisin
							
						
513d897ea0
						
							
							Implement get_meta() for archives
						
						
2018-10-25 11:29:50 +02:00
							
							
								
								jvoisin
							
						
5a9dc388ad
						
							
							Minor refactorisation of how we're checking for exiftool's presence
						
						
2018-10-25 11:05:06 +02:00
							
							
								
								jvoisin
							
						
5a08f5b7bf
						
							
							Add a test for tiff lightweight cleaning
						
						
2018-10-24 20:19:36 +02:00
							
							
								
								jvoisin
							
						
fe885babee
						
							
							Implement lightweight cleaning for jpg
						
						
2018-10-24 19:35:07 +02:00
							
							
								
								jvoisin
							
						
1040a594d6
						
							
							Fix a stupid typo in the changelog
						
						
2018-10-23 17:13:53 +02:00
							
							
								
								jvoisin
							
						
e510a225e3
						
							
							Bump the changelog
						
						
2018-10-23 17:07:42 +02:00
							
							
								
								jvoisin
							
						
a98962a0fa
						
							
							Document that FFmpeg is now an optional dependency
						
						
2018-10-23 16:57:18 +02:00
							
							
								
								jvoisin
							
						
9a81b3adfd
						
							
							Improve type annotation coverage
						
						
2018-10-23 16:32:28 +02:00
							
							
								
								jvoisin
							
						
f1a071d460
						
							
							Implement lightweight cleaning for png and tiff
						
						
2018-10-23 16:22:11 +02:00
							
							
								
								jvoisin
							
						
38df679a88
						
							
							Optimize the handling of problematic files
						
						
2018-10-23 13:49:58 +02:00
							
							
								
								jvoisin
							
						
44f267a596
						
							
							Improve problematic filenames support
						
						
2018-10-22 16:56:05 +02:00
							
							
								
								jvoisin
							
						
5bc88faedf
						
							
							Fix the testsuite on fedora
						
						
2018-10-22 13:55:09 +02:00
							
							
								
								jvoisin
							
						
83389a63e9
						
							
							Test mat2's reliability wrt. corrupted video files
						
						
2018-10-22 13:42:04 +02:00
							
							
								
								jvoisin
							
						
e70ea811c9
						
							
							Implement support for .avi files, via ffmpeg
						
						
...
- This commit introduces optional dependencies (namely ffmpeg):
  mat2 will spit a warning when trying to process an .avi file
  if ffmpeg isn't installed.
- Since metadata are obtained via exiftool, this commit
  also refactors a bit our exfitool wrapper.
2018-10-22 12:58:01 +02:00
							
							
								
								jvoisin
							
						
2ae5d909c3
						
							
							Make pyflakes happy
						
						
2018-10-18 21:22:28 +02:00
							
							
								
								jvoisin
							
						
5896387ade
						
							
							Output metadata in a sorted fashion
						
						
2018-10-18 21:17:12 +02:00
							
							
								
								jvoisin
							
						
d4c050a738
						
							
							wtf python
						
						
2018-10-18 20:29:50 +02:00
							
							
								
								jvoisin
							
						
f04d4b28fc
						
							
							Fix the tests on Debian?
						
						
2018-10-18 20:23:00 +02:00
							
							
								
								jvoisin
							
						
da88d30689
						
							
							Fix the CI on debian
						
						
2018-10-14 10:59:50 +02:00
							
							
								
								Rémi Oudin
							
						
f1552b2ccb
						
							
							Make testsuite fail if coverage is under 100%
						
						
...
Fixes issue #61
2018-10-12 17:07:56 +02:00
							
							
								
								jvoisin
							
						
2ba38dd2a1
						
							
							Bump mypy typing coverage
						
						
2018-10-12 14:32:09 +02:00
							
							
								
								jvoisin
							
						
b832a59414
						
							
							Refactor lightweight mode implementation
						
						
2018-10-12 11:49:24 +02:00
							
							
								
								Sébastien Helleu
							
						
6ce88b8b7f
						
							
							Fix typo in README
						
						
2018-10-11 21:40:58 +02:00
							
							
								
								jvoisin
							
						
2444caccc0
						
							
							Make pylint happier
						
						
2018-10-11 19:55:07 +02:00
							
							
								
								jvoisin
							
						
b9dbd12ef9
						
							
							Implement recursive metadata for FLAC files
						
						
...
Since FLAC files can contain covers, it makes sense
to parse their metadata
2018-10-11 19:52:47 +02:00
							
							
								
								jvoisin
							
						
b2e153b69c
						
							
							Delete pictures of FLAC files
						
						
2018-10-11 18:15:11 +02:00
							
							
								
								Simon Magnin
							
						
35dca4bf1c
						
							
							add recursivity for archive style files
						
						
2018-10-11 08:28:02 -07:00
							
							
								
								jvoisin
							
						
4ed30b5e00
						
							
							Add the mailing list announcement to the release process
						
						
2018-10-06 20:00:50 +02:00
							
							
								
								jvoisin
							
						
0d25b18d26
						
							
							Improve both the typing and the comments
						
						
2018-10-05 17:07:58 +02:00
							
							
								
								jvoisin
							
						
d0f3534eff
						
							
							Hide unsupported extensions in mat2 -l
						
						
2018-10-05 12:43:21 +02:00
							
							
								
								jvoisin
							
						
8675706c93
						
							
							Improve the display of mat2 when no metadata are found
						
						
...
This should close #74
2018-10-05 12:35:35 +02:00
							
							
								
								Poncho
							
						
5e196ecef8
						
							
							Update logo
						
						
...
Use color palette an size according to
https://developer.gnome.org/hig/stable/icon-design.html.en
2018-10-05 11:13:31 +02:00
							
							
								
								jvoisin
							
						
8e98593b02
						
							
							Trash word/people.xml in office files
						
						
2018-10-04 16:28:20 +02:00
							
							
								
								jvoisin
							
						
df252fd71a
						
							
							Remove a superfluous import
						
						
2018-10-04 16:19:38 +02:00
							
							
								
								jvoisin
							
						
a1c39104fc
						
							
							Make the testsuite runnable on the installed MAT2
						
						
2018-10-04 16:16:52 +02:00
							
							
								
								georg
							
						
34fbd633fd
						
							
							libmat2: fix shebang
						
						
...
Relates 0a2a398c9c
2018-10-03 18:38:28 +00:00
							
							
								
								jvoisin
							
						
f1ceed13b5
						
							
							Bump the changelog
						
						
2018-10-03 16:38:05 +02:00
							
							
								
								jvoisin
							
						
5a5c642a46
						
							
							Don't break office files for MS Office
						
						
...
We didn't take the whitelist into account while
removing dangling files from [Content_types].xml
2018-10-03 16:38:05 +02:00
							
							
								
								jvoisin
							
						
84e302ac93
						
							
							Remove file left behind by the testsuite
						
						
2018-10-03 16:38:05 +02:00
							
							
								
								jvoisin
							
						
7901fdef2e
						
							
							Fix the testsuite
						
						
2018-10-03 15:29:46 +02:00
							
							
								
								jvoisin
							
						
1b356b8c6f
						
							
							Improve mat2's cli reliability
						
						
...
- Replace some class members by instance members
- Don't thread the cleaning process anymore for now
2018-10-03 15:22:36 +02:00
							
							
								
								jvoisin
							
						
c67bbafb2c
						
							
							Use [Content_Types].xml to improve MS Office coverage
						
						
2018-10-02 11:55:42 -07:00
							
							
								
								georg
							
						
5b606f939d
						
							
							fix typo
						
						
2018-10-02 16:01:24 +00:00
							
							
								
								jvoisin
							
						
156e81fb4c
						
							
							Check that cleaning twice doesn't break the file
						
						
2018-10-02 16:05:51 +02:00
							
							
								
								jvoisin
							
						
9578e4b4ee
						
							
							Silence a bit the testsuite
						
						
2018-10-02 15:26:13 +02:00
							
							
								
								jvoisin
							
						
a46a7eb6fa
						
							
							Update the CONTRIBUTING.md file wrt. to the previous commit
						
						
2018-10-02 11:12:50 +02:00
							
							
								
								georg
							
						
a24c59b208
						
							
							manpage: this is about mat2, not mat
						
						
2018-10-01 21:26:59 +00:00
							
							
								
								jvoisin
							
						
652b8e519f
						
							
							Files processed via MAT2 are now accepted without warnings by MS Office
						
						
2018-10-01 12:25:37 -07:00
							
							
								
								jvoisin
							
						
c14be47f95
						
							
							Fix a typo in the README spotted by @georg
						
						
2018-10-01 15:51:22 +02:00
							
							
								
								jvoisin
							
						
81a3881aa4
						
							
							Please mypy
						
						
2018-09-30 19:55:17 +02:00
							
							
								
								jvoisin
							
						
e342671ead
						
							
							Remove dangling references in MS Office's [Content_types].xml
						
						
2018-09-30 19:53:18 +02:00
							
							
								
								jvoisin
							
						
212d9c472c
						
							
							Document mat2's output scheme in the manpage as well
						
						
2018-09-26 00:13:44 +02:00
							
							
								
								jvoisin
							
						
a88107c9ca
						
							
							Document the output scheme in the README
						
						
2018-09-26 00:11:16 +02:00
							
							
								
								jvoisin
							
						
7f629ed2e3
						
							
							Run the testsuite exclusively on Whitewhale for now
						
						
...
This should fix the intermittent failures, thanks
to @pollo for the tip
2018-09-25 17:09:04 +02:00
							
							
								
								jvoisin
							
						
719cdf20fa
						
							
							Second pass of minor formatting
						
						
2018-09-24 20:15:07 +02:00
							
							
								
								jvoisin
							
						
2e243355f5
						
							
							Fix some minor formatting issues
						
						
2018-09-24 19:50:24 +02:00
							
							
								
								jvoisin
							
						
174d4a0ac0
						
							
							Implement rsid stripping for office files
						
						
...
MS Office XML rsid is a "unique identifier used to track the editing session
when the physical character representing this section mark was last formatted."
See the following links for details:
- https://msdn.microsoft.com/en-us/library/office/documentformat.openxml.wordprocessing.previoussectionproperties.rsidrpr.aspx
- https://blogs.msdn.microsoft.com/brian_jones/2006/12/11/whats-up-with-all-those-rsids/.
2018-09-24 18:03:59 +02:00
							
							
								
								jvoisin
							
						
fbcf68c280
						
							
							Lexicographical sort on xml attributes for office files
						
						
...
In XML, the order of the attributes shouldn't be meaningful,
however, MS Office sorts attributes for a given XML tag
differently than LibreOffice.
2018-09-24 17:45:09 +02:00
							
							
								
								jvoisin
							
						
9826de3526
						
							
							Add a test for zip ordering
						
						
2018-09-20 14:04:46 +02:00
							
							
								
								jvoisin
							
						
ab71c29a28
						
							
							Make pyflakes happy
						
						
2018-09-20 01:19:22 +02:00
							
							
								
								jvoisin
							
						
3d2842802c
						
							
							Split the tests
						
						
2018-09-20 01:13:59 +02:00
							
							
								
								jvoisin
							
						
a1a06d023e
						
							
							Insert archive members in lexicographic order
						
						
2018-09-18 22:44:21 +02:00
							
							
								
								jvoisin
							
						
9275d64be5
						
							
							Add a link to the gentoo overlay
						
						
2018-09-17 21:11:48 +02:00
							
							
								
								Yoann Lamouroux
							
						
0a2a398c9c
						
							
							trivial modification of all shebang.
						
						
...
`/usr/bin/python3` -> `/usr/bin/env python3`
It's always better to trust the environment defined path to bin/python, as
virtualenv become the way to go.
2018-09-12 14:58:27 +02:00
							
							
								
								jvoisin
							
						
5cf94bd256
						
							
							Bump coverage back to 100%
						
						
2018-09-12 14:54:54 +02:00
							
							
								
								jvoisin
							
						
de65f4f4d4
						
							
							Improve the resilience of MAT2 wrt. corrupted PNG
						
						
2018-09-09 19:09:05 +02:00
							
							
								
								jvoisin
							
						
759efa03ee
						
							
							Fix a setuptool-related warning
						
						
2018-09-06 11:42:07 +02:00
							
							
								
								jvoisin
							
						
9fe6f1023b
						
							
							Make pylint happy
						
						
2018-09-06 11:36:04 +02:00
							
							
								
								jvoisin
							
						
e3d817f57e
						
							
							Split office and archives
						
						
2018-09-06 11:34:14 +02:00
							
							
								
								jvoisin
							
						
2e9adab86a
						
							
							Improve a cli test resilience
						
						
2018-09-06 11:32:29 +02:00
							
							
								
								jvoisin
							
						
c8c27dcf38
						
							
							Mention "scambled exif" as a related software
						
						
2018-09-06 11:20:08 +02:00
							
							
								
								jvoisin
							
						
120b204988
						
							
							Change a bit the previous commit
						
						
2018-09-06 11:13:11 +02:00
							
							
								
								Daniel Kahn Gillmor
							
						
f3cef319b9
						
							
							Unknown Members: make policy use an Enum
						
						
...
Closes #60
Note: this changeset also ensures that clean.cleaned.docx is removed
up after the pytest is over.
2018-09-05 18:59:33 -04:00
							
							
								
								Daniel Kahn Gillmor
							
						
2d9ba81a84
						
							
							spelling correction.
						
						
...
while mat2 has both a thread model (a thread pool that strips metadata
in parallel) and a threat model (a list of malicious adversaries and
their capabilities that we are trying to defeat), i think this
paragraph is talking about the latter.
2018-09-05 13:00:28 -04:00
							
							
								
								jvoisin
							
						
072ee1814d
						
							
							Remove defusedxml support and document why
						
						
2018-09-05 18:41:08 +02:00
							
							
								
								jvoisin
							
						
3649c0ccaf
						
							
							Remove short version of dangerous/advanced options
						
						
2018-09-05 17:48:14 +02:00
							
							
								
								Christian
							
						
119085f28d
						
							
							Add missing dependencies for the Nautilus extension to INSTALL.md
						
						
2018-09-05 17:42:39 +02:00
							
							
								
								Christian
							
						
e515d907d7
						
							
							Make sure target directory exists, assume MAT2 is in parent directory
						
						
2018-09-05 17:42:13 +02:00
							
							
								
								jvoisin
							
						
46bb1b83ea
						
							
							Improve the previous commit
						
						
2018-09-05 17:26:09 +02:00
							
							
								
								Daniel Kahn Gillmor
							
						
1d7e374e5b
						
							
							office: try all members, even when one fails
						
						
...
the end result will be the same -- an abort -- but the user will get
to see all the warnings for a particular file, instead of getting them
one at a time.
2018-09-04 18:28:04 -04:00
							
							
								
								Daniel Kahn Gillmor
							
						
915dc634c4
						
							
							document all unknown/unhandlable files even on abort
						
						
...
This makes it easy to get a list of all files that mat2 doesn't know
how to handle, without having to choose -u keep or -u omit.
2018-09-04 18:28:04 -04:00
							
							
								
								Daniel Kahn Gillmor
							
						
10d60bd398
						
							
							add --unknown-members argument to mat2
						
						
...
This allows the user to make use of parser.unknown_member_policy for
archive formats.
At the suggestion of @jvoisin, it also prints a scary warning if the
user explicitly chooses 'keep'.
2018-09-04 18:28:04 -04:00
							
							
								
								Daniel Kahn Gillmor
							
						
4192a2daa3
						
							
							office: create policy for what to do about unknown members
						
						
...
previously, encountering an unknown member meant that any parser of
this type would abort.
now, the user can set parser.unknown_member_policy to either 'omit' or
'keep' if they don't want the current action of 'abort'
note that this causes pylint to complain about branching depth for
remove_all() because of the nuanced error-handling.  I've disabled
this check.
2018-09-04 16:13:33 -04:00
							
							
								
								jvoisin
							
						
9ce458cb3b
						
							
							Update the release process to create signed tarballs
						
						
2018-09-03 14:28:00 +02:00
							
							
								
								jvoisin
							
						
907fc591cc
						
							
							Bump the coverage back to 100%
						
						
2018-09-01 16:58:34 +02:00
							
							
								
								jvoisin
							
						
8255293d1d
						
							
							Add a link to the mailing list
						
						
2018-09-01 16:45:20 +02:00
							
							
								
								jvoisin
							
						
6b7e8ad8c0
						
							
							Add a .mailmap file
						
						
2018-09-01 16:12:03 +02:00
							
							
								
								jvoisin
							
						
b7a8622682
						
							
							Bump the changelog
						
						
2018-09-01 16:00:41 +02:00
							
							
								
								Daniel Kahn Gillmor
							
						
3e2890eb9e
						
							
							three minor spelling fixes
						
						
2018-09-01 06:47:22 -07:00
							
							
								
								jvoisin
							
						
91e80527fc
						
							
							Add archlinux to the CI
						
						
2018-09-01 15:41:22 +02:00
							
							
								
								jvoisin
							
						
7877ba0da5
						
							
							Fix a minor formatting issue
						
						
2018-09-01 14:16:55 +02:00
							
							
								
								dkg
							
						
e2634f7a50
						
							
							Logging cleanup
						
						
2018-09-01 05:14:32 -07:00
							
							
								
								jvoisin
							
						
aba9b72d2c
						
							
							Fix some leftovers from the previous commit
						
						
2018-08-26 01:10:48 +02:00
							
							
								
								Antoine Tenart
							
						
15dd3d84ff
						
							
							nautilus: rename the nautilus plugin
						
						
...
Rename the Nautilus plugin (removing 'nautilus' from the file name) as
it already lives in its own 'nautilus' directory. The same argument
applies when installing the plugin in a distro.
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-08-26 01:09:41 +02:00
							
							
								
								Antoine Tenart
							
						
588466f4a8
						
							
							INSTALL: add instructions for the Fedora copr
						
						
...
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-08-24 18:47:39 +02:00
							
							
								
								Antoine Tenart
							
						
cf89ff45c2
						
							
							gitignore: exclude all hidden files from being committed
						
						
...
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-08-24 09:14:05 +02:00
							
							
								
								Antoine Tenart
							
						
f583d12564
						
							
							nautilus: remove swp file
						
						
...
A .swp file was committed by mistake. Remove it.
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-08-24 09:09:49 +02:00
							
							
								
								jvoisin
							
						
1c72448e58
						
							
							Improve the detection of unsupported extensions in uppercase
						
						
2018-08-23 21:28:37 +02:00
							
							
								
								Antoine Tenart
							
						
f068621628
						
							
							libmat2: images: fix handling of .JPG files
						
						
...
Pixbuf only supports .jpeg files, not .jpg, so libmat2 looks for such an
extension and converts it if necessary. As this check is case sensitive,
processing .JPG files does not work.
Fixes #47.
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-08-23 20:43:27 +02:00
							
							
								
								jvoisin
							
						
fe09d81ab1
						
							
							Don't forget to tell the downstreams about new releases
						
						
2018-08-19 15:51:44 +02:00
							
							
								
								jvoisin
							
						
5be66dbe91
						
							
							Mention the Arch linux's AUR package of MAT2
						
						
2018-08-19 15:51:23 +02:00
							
							
								
								jvoisin
							
						
ee496cfa7f
						
							
							Fix a typo spotted by @Francois_B
						
						
2018-08-19 15:51:09 +02:00
							
							
								
								jvoisin
							
						
6e2e411a2a
						
							
							Add an INSTALL.md file
						
						
2018-08-08 20:45:09 +02:00
							
							
								
								jvoisin
							
						
2ce1dc793e
						
							
							Bump the changelog
						
						
2018-08-03 22:20:24 +02:00
							
							
								
								jvoisin
							
						
e27768824a
						
							
							Change mat2's logo
						
						
2018-08-03 21:45:41 +02:00
							
							
								
								jvoisin
							
						
36c5bad140
						
							
							Improve our .gitignore
						
						
2018-07-30 23:00:33 +02:00
							
							
								
								jvoisin
							
						
b5a9520a60
						
							
							Add a cli-related test
						
						
2018-07-30 22:54:41 +02:00
							
							
								
								jvoisin
							
						
a1257c538b
						
							
							Add some tests about pathological files
						
						
2018-07-30 22:36:36 +02:00
							
							
								
								Antoine Tenart
							
						
6d8e999f12
						
							
							Rename image to icon in the Nautilus extension
						
						
...
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-07-26 09:01:27 +02:00
							
							
								
								Antoine Tenart
							
						
1bc4c7aac9
						
							
							Switch columns in the Nautilus extension
						
						
...
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-07-26 09:01:01 +02:00
							
							
								
								Antoine Tenart
							
						
03245a8731
						
							
							Rename the Nautilus path column to file
						
						
...
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-07-26 08:57:33 +02:00
							
							
								
								Antoine Tenart
							
						
27445e9134
						
							
							Rename the Nautilus exit button to close
						
						
...
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-07-26 08:57:09 +02:00
							
							
								
								jvoisin
							
						
b32ba9f736
						
							
							Improve a bit nautilus' popup
						
						
2018-07-25 22:48:05 +02:00
							
							
								
								jvoisin
							
						
e9f28edf73
						
							
							Add a man page and document how to keep it up to date
						
						
2018-07-24 22:34:33 +02:00
							
							
								
								jvoisin
							
						
7697f9c085
						
							
							Improve the linters' coverage
						
						
2018-07-23 23:55:45 +02:00
							
							
								
								jvoisin
							
						
e920083559
						
							
							The Nautilus extension is now working
						
						
2018-07-23 23:39:06 +02:00
							
							
								
								georg
							
						
71b1ced842
						
							
							AbstractParser: Fix typos
						
						
2018-07-21 00:46:48 +00:00
							
							
								
								jvoisin
							
						
942859601d
						
							
							Improve the code's documentation
						
						
2018-07-19 23:10:27 +02:00
							
							
								
								jvoisin
							
						
565cb66d14
						
							
							Minor simplification in how we're handling xml for office files
						
						
2018-07-19 22:55:08 +02:00
							
							
								
								jvoisin
							
						
052a356750
						
							
							Implement a much better Nautilus extension thanks to @atenart
						
						
...
Co-authored-by: Antoine Tenart <antoine.tenart@ack.tf>
Co-authored-by: jvoisin <julien.voisin@dustri.org>
2018-07-19 00:11:30 +02:00
							
							
								
								jvoisin
							
						
2f670651cf
						
							
							Minor cleanup of the Nautilus extension's code
						
						
2018-07-18 23:20:51 +02:00
							
							
								
								jvoisin
							
						
0cd510938a
						
							
							Minor code simplification
						
						
2018-07-18 23:15:47 +02:00
							
							
								
								jvoisin
							
						
dc026f99ad
						
							
							Show if files are supported or not in the Nautilus extension
						
						
2018-07-18 23:12:55 +02:00
							
							
								
								jvoisin
							
						
0aac0d644d
						
							
							Show a pretty icon for files in the Nautilus extension
						
						
2018-07-18 22:53:56 +02:00
							
							
								
								jvoisin
							
						
17e69b6005
						
							
							Change a button in the nautilus extension
						
						
2018-07-18 22:39:18 +02:00
							
							
								
								jvoisin
							
						
cf5f3b268d
						
							
							Add a separator for the Nautilus extension
						
						
2018-07-18 22:39:10 +02:00
							
							
								
								jvoisin
							
						
a5eede9a21
						
							
							Remove the disclaimer from the Nautilus extension
						
						
2018-07-18 22:38:42 +02:00
							
							
								
								Antoine Tenart
							
						
926e8dac5f
						
							
							nautilus: first working version
						
						
...
Improve the nautilus extension to get to a first working version:
- Single and multiple selections are working.
- The menu item only is there if mat2 has a chance to work on the
  selected files.
- Errors are reported using notifications.
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-07-18 22:38:05 +02:00
							
							
								
								georg
							
						
edc5f86552
						
							
							README: Fix typo
						
						
2018-07-16 15:09:22 +00:00
							
							
								
								jvoisin
							
						
84d50f97c0
						
							
							Add a check for a missed dependency in ./mat2 -c
						
						
2018-07-15 17:00:01 +02:00
							
							
								
								jvoisin
							
						
8093dce88e
						
							
							Bump the changelog
						
						
2018-07-10 21:41:24 +02:00
							
							
								
								jvoisin
							
						
5a7c7f35f7
						
							
							Remove print from libmat, and use the logging module instead
						
						
...
This should close #28
2018-07-10 21:30:38 +02:00
							
							
								
								jvoisin
							
						
d5861e4653
						
							
							Implement a check for dependencies in mat2
						
						
...
Example use:
```
$ mat2 -c
Dependencies required for MAT2 0.1.3:
- Cairo: yes
- Exiftool: yes
- GdkPixbuf from PyGobject: yes
- Mutagen: yes
- Poppler from PyGobject: yes
- PyGobject: yes
```
This should close #35
2018-07-10 21:24:26 +02:00
							
							
								
								jvoisin
							
						
22e3918f67
						
							
							Add pylint3 to the ci
						
						
2018-07-09 01:22:08 +02:00
							
							
								
								jvoisin
							
						
080d6769ca
						
							
							Make pylint even happier
						
						
2018-07-09 01:11:44 +02:00
							
							
								
								jvoisin
							
						
86fe3aa584
						
							
							Fix the previous commit
						
						
2018-07-09 00:30:16 +02:00
							
							
								
								jvoisin
							
						
cc327b1592
						
							
							Minor improvement of fedora's duration in the testsuite
						
						
2018-07-09 00:27:40 +02:00
							
							
								
								jvoisin
							
						
b4edd6d2a2
						
							
							Document that MAT2 not being able to detect metadata doesn't mean that the file is clean
						
						
2018-07-09 00:17:59 +02:00
							
							
								
								jvoisin
							
						
bd357b85f8
						
							
							Remove a useless option that was never implemented anyway
						
						
2018-07-09 00:13:16 +02:00
							
							
								
								jvoisin
							
						
8c21006e6c
						
							
							Fix some pep8 issues spotted by pyflakes
						
						
2018-07-08 22:40:36 +02:00
							
							
								
								jvoisin
							
						
f49aa5cab7
						
							
							Achieve 100% coverage!
						
						
2018-07-08 22:27:37 +02:00
							
							
								
								jvoisin
							
						
52a2c800b7
						
							
							Bump coverage again
						
						
2018-07-08 21:50:52 +02:00
							
							
								
								jvoisin
							
						
ad3e7ccee8
						
							
							Bump coverage for office files and fix some related crashes
						
						
2018-07-08 21:35:45 +02:00
							
							
								
								jvoisin
							
						
ca01484126
						
							
							Silence a mypy's stupid warning
						
						
2018-07-08 17:12:17 +02:00
							
							
								
								jvoisin
							
						
f9bc022c96
						
							
							Add defusedxml as an (optional) way to prevent XML-based attacks
						
						
...
Those attacks are DoS-only.
2018-07-08 17:07:26 +02:00
							
							
								
								jvoisin
							
						
72e1fda18d
						
							
							Remove a leftover print
						
						
2018-07-08 15:19:18 +02:00
							
							
								
								jvoisin
							
						
3cd4f9111f
						
							
							Bump coverage for torrent handling
						
						
2018-07-08 15:13:03 +02:00
							
							
								
								jvoisin
							
						
b5fcddd6a6
						
							
							Simplify how torrent files are handled
						
						
...
- Rework the testsuite wrt. torrent
- fail at parser's instantiation on corrupted torrent,
  instead of during `get_meta` or `remove_all` call
2018-07-08 13:49:11 +02:00
							
							
								
								jvoisin
							
						
7ea362d908
						
							
							Bump the coverage for pdf
						
						
2018-07-07 18:12:33 +02:00
							
							
								
								jvoisin
							
						
85455a4419
						
							
							Fix a mistake in office file revisions handling
						
						
2018-07-07 18:05:54 +02:00
							
							
								
								jvoisin
							
						
9f631a1bb1
						
							
							Bump a bit the coverage
						
						
2018-07-07 18:02:53 +02:00
							
							
							
						
@@ -1,5 +1,9 @@
.*
*.pyc
.coverage
.eggs
.mypy_cache/
build
dist
mat2.egg-info
tags
							
							
							
						
 
							
							
								
							
							
						
@@ -9,14 +9,25 @@ bandit:
  script:  # TODO: remove B405 and B314
  - apt-get -qqy update
  - apt-get -qqy install --no-install-recommends python3-bandit
  - bandit ./mat2 --format txt --skip B101
  - bandit -r ./nautilus/ --format txt --skip B101
  - bandit -r ./libmat2 --format txt --skip B101,B404,B603,B405,B314
pylint:
  stage: linting
  script:
  - apt-get -qqy update
  - apt-get -qqy install --no-install-recommends pylint3 python3-mutagen python3-gi-cairo gir1.2-poppler-0.18 gir1.2-gdkpixbuf-2.0
  - pylint3 --extension-pkg-whitelist=cairo,gi ./libmat2 ./mat2
  # Once nautilus-python is in Debian, decomment it form the line below
  - pylint3 --extension-pkg-whitelist=Nautilus,GObject,Gtk,Gio,GLib,gi ./nautilus/mat2.py
pyflakes:
  stage: linting
  script:
  - apt-get -qqy update
  - apt-get -qqy install --no-install-recommends pyflakes3
  - pyflakes3 ./libmat2 ./mat2 ./tests/
  - pyflakes3 ./libmat2 ./mat2 ./tests/ ./nautilus
mypy:
  stage: linting
							
							
							
								
							
						
@@ -24,23 +35,42 @@ mypy:
  - apt-get -qqy update
  - apt-get -qqy install --no-install-recommends python3-pip
  - pip3 install mypy
  - mypy mat2 libmat2/*.py --ignore-missing-imports
  - mypy --ignore-missing-imports mat2 libmat2/*.py ./nautilus/mat2.py
tests:debian:
  stage: test
  script:
  - apt-get -qqy update
  - apt-get -qqy install --no-install-recommends python3-mutagen python3-gi-cairo gir1.2-poppler-0.18 gir1.2-gdkpixbuf-2.0 libimage-exiftool-perl python3-coverage
  - apt-get -qqy install --no-install-recommends python3-mutagen python3-gi-cairo gir1.2-poppler-0.18 gir1.2-gdkpixbuf-2.0 libimage-exiftool-perl python3-coverage ffmpeg
  - apt-get -qqy purge bubblewrap
  - python3-coverage run --branch -m unittest discover -s tests/
  - python3-coverage report -m --include 'libmat2/*'
  - python3-coverage report --fail-under=90 -m --include 'libmat2/*'
tests:debian_with_bubblewrap:
  stage: test
  tags:
    - whitewhale
  script:
  - apt-get -qqy update
  - apt-get -qqy install --no-install-recommends python3-mutagen python3-gi-cairo gir1.2-poppler-0.18 gir1.2-gdkpixbuf-2.0 libimage-exiftool-perl python3-coverage ffmpeg bubblewrap
  - python3-coverage run --branch -m unittest discover -s tests/
  - python3-coverage report --fail-under=100 -m --include 'libmat2/*'
tests:fedora:
  image: fedora
  stage: test
  tags:
    - whitewhale
  script:
  - dnf install -y python3 python3-mutagen python3-gobject
  - dnf install -y gdk-pixbuf2 poppler-glib gdk-pixbuf2 gdk-pixbuf2-modules
  - dnf install -y cairo-gobject cairo python3-cairo
  - dnf install -y perl-Image-ExifTool mailcap
  - dnf install -y python3 python3-mutagen python3-gobject gdk-pixbuf2 poppler-glib gdk-pixbuf2 gdk-pixbuf2-modules cairo-gobject cairo python3-cairo perl-Image-ExifTool mailcap
  - gdk-pixbuf-query-loaders-64 > /usr/lib64/gdk-pixbuf-2.0/2.10.0/loaders.cache
  - python3 setup.py test
tests:archlinux:
  image: archlinux/base
  stage: test
  tags:
    - whitewhale
  script:
  - pacman -Sy --noconfirm python-mutagen python-gobject gdk-pixbuf2 poppler-glib gdk-pixbuf2 python-cairo perl-image-exiftool python-setuptools mailcap ffmpeg
  - python3 setup.py test
							
							
							
						
 
							
							
							
						
@@ -0,0 +1,5 @@
Julien (jvoisin) Voisin <julien.voisin+mat2@dustri.org> totallylegit <totallylegit@dustri.org>
Julien (jvoisin) Voisin <julien.voisin+mat2@dustri.org> jvoisin <julien.voisin@dustri.org>
Julien (jvoisin) Voisin <julien.voisin+mat2@dustri.org> jvoisin <jvoisin@riseup.net>
Daniel Kahn Gillmor <dkg@fifthhorseman.net> dkg <dkg@fifthhorseman.net>
							
							
							
						
@@ -0,0 +1,17 @@
[FORMAT]
good-names=e,f,i,x,s
max-locals=20
[MESSAGES CONTROL]
disable=
    fixme,
    invalid-name,
    duplicate-code,
    missing-docstring,
    protected-access,
    abstract-method,
    wrong-import-position,
    catching-non-exception,
    cell-var-from-loop,
    locally-disabled,
    invalid-sequence-index,  # pylint doesn't like things like `Tuple[int, bytes]` in type annotation
							
							
							
						
@@ -1,3 +1,83 @@
# 0.7.0 - 2019-02-17
- Add support for wmv files
- Add support for gif files
- Add support for html files
- Sandbox external processes via bubblewrap
- Simplify archive-based formats processing
- The Nautilus extension now plays nicer with other extensions
# 0.6.0 - 2018-11-10
- Add lightweight cleaning for jpeg
- Add support for zip files
- Add support for mp4 files
- Improve metadata extraction for archives
- Improve robustness against corrupted embedded files
- Fix a possible security issue on some terminals (control character
	injection via --show)
- Various internal cleanup/improvements
# 0.5.0 - 2018-10-23
- Video (.avi files for now) support, via FFmpeg, optionally
- Lightweight cleaning for png and tiff files
- Processing files starting with a dash is now quicker
- Metadata are now displayed sorted
- Recursive metadata support for FLAC files
- Unsupported extensions aren't displayed in `./mat2 -l` anymore
- Improve the display when no metadata are found
- Update the logo according to the GNOME guidelines
- The testsuite is now runnable on the installed version of mat2
- Various internal cleanup/improvements
# 0.4.0 - 2018-10-03
- There is now a policy, for advanced users, to deal with unknown embedded fileformats
- Improve the documentation
- Various minor refactoring
- Improve how corrupted PNG are handled
- Dangerous/advanced cli's options no longer have short versions
- Significant improvements to office files anonymisation
	- Archive members are sorted lexicographically
	- XML attributes are sorted lexicographically too
	- RSID are now stripped
	- Dangling references in [Content_types].xml are now removed
- Significant improvements to office files support
- Anonimysed office files can now be opened by MS Office without warnings
- The CLI isn't threaded anymore, for it was causing issues
- Various misc typo fix
# 0.3.1 - 2018-09-01
- Document how to install MAT2 for various distributions
- Fix various typos in the documentation/comments
- Add ArchLinux to the CI to ensure that MAT2 is running on it
- Fix the handling of files with a name ending in `.JPG`
- Improve the detection of unsupported extensions in upper-case
- Streamline MAT2's logging
# 0.3.0 - 2018-08-03
- Add a check for missing dependencies
- Add Nautilus extension
- Minors code simplifications
- Improve our linters' coverage
- Add a manpage
- Add folder/multiple files related tests
- Change the logo
# 0.2.0 - 2018-07-10
- Fix various crashes dues to malformed files
- Simplify various code-paths
- Remove superfluous debug message
- Remove the `--check` option that never was implemented anyway
- Add a `-c` option to check for MAT2's dependencies
# 0.1.3 - 2018-07-06
- Improve MAT2 resilience against corrupted images
							
								
							
							
							
						
 
							
							
								
							
							
						
@@ -24,6 +24,14 @@ Since MAT2 is written in Python3, please conform as much as possible to the
1. Update the [changelog](https://0xacab.org/jvoisin/mat2/blob/master/CHANGELOG.md)
2. Update the version in the [mat2](https://0xacab.org/jvoisin/mat2/blob/master/mat2) file
3. Update the version in the [setup.py](https://0xacab.org/jvoisin/mat2/blob/master/setup.py) file
4. Commit the changelog, mat2 and setup.py files
5. Create a tag with `git tag -s $VERSION`
6. Push the tag with `git push --tags`
4. Update the version and date in the [man page](https://0xacab.org/jvoisin/mat2/blob/master/doc/mat2.1)
5. Commit the changelog, man page, mat2 and setup.py files
6. Create a tag with `git tag -s $VERSION`
7. Push the commit with `git push origin master`
8. Push the tag with `git push --tags`
9. Create the signed tarball with `git archive --format=tar.xz --prefix=mat-$VERSION/ $VERSION > mat-$VERSION.tar.xz`
10. Sign the tarball with `gpg --armor --detach-sign mat-$VERSION.tar.xz`
11. Upload the result on Gitlab's [tag page](https://0xacab.org/jvoisin/mat2/tags) and add the changelog there
12. Announce the release on the [mailing list](https://mailman.boum.org/listinfo/mat-dev)
13. Upload the new version on pypi with `python3 setup.py sdist bdist_wheel` then `twine upload -s dist/*`
14. Do the secret release dance
							
							
							
						
 
							
							
							
						
@@ -0,0 +1,76 @@
# Python ecosystem
If you feel like running arbitrary code downloaded over the
internet (pypi doesn't support gpg signatures [anymore](https://github.com/pypa/python-packaging-user-guide/pull/466)),
mat2 is [available on pypi](https://pypi.org/project/mat2/), and can be
installed like this:
```
pip3 install mat2
```
# GNU/Linux
## Optional dependencies
When [bubblewrap](https://github.com/projectatomic/bubblewrap) is
installed, MAT2 uses it to sandbox any external processes it invokes.
## Fedora
Thanks to [atenart](https://ack.tf/), there is a package available on
[Fedora's copr]( https://copr.fedorainfracloud.org/coprs/atenart/mat2/ ).
We use copr (cool other packages repo) as the Mat2 Nautilus plugin depends on
python3-nautilus, which isn't available yet in Fedora (but is distributed
through this copr).
First you need to enable Mat2's copr:
```
dnf -y copr enable atenart/mat2
```
Then you can install both the Mat2 command and Nautilus extension:
```
dnf -y install mat2 mat2-nautilus
```
## Debian
There a package available in Debian *buster/sid*. The package [doesn't include
the Nautilus extension yet](https://bugs.debian.org/910491).
For Debian 9 *stretch*, there is a way to install it *manually*:
```
# apt install python3-mutagen python3-gi-cairo gir1.2-gdkpixbuf-2.0 libimage-exiftool-perl gir1.2-glib-2.0 gir1.2-poppler-0.18 ffmpeg
# apt install bubblewrap  # if you want sandboxing
$ git clone https://0xacab.org/jvoisin/mat2.git
$ cd mat2
$ ./mat2
```
and if you want to install the über-fancy Nautilus extension:
```
# apt install gnome-common gtk-doc-tools libnautilus-extension-dev python-gi-dev python3-dev build-essential
$ git clone https://github.com/GNOME/nautilus-python
$ cd nautilus-python
$ PYTHON=/usr/bin/python3 ./autogen.sh
$ make
# make install
$ mkdir -p ~/.local/share/nautilus-python/extensions/
$ cp ../nautilus/mat2.py ~/.local/share/nautilus-python/extensions/
$ PYTHONPATH=/home/$USER/mat2 PYTHON=/usr/bin/python3 nautilus
```
## Arch Linux
Thanks to [Francois_B](https://www.sciunto.org/), there is an package available on
[Arch linux's AUR](https://aur.archlinux.org/packages/mat2/).
## Gentoo
MAT2 is available in the [torbrowser overlay](https://github.com/MeisterP/torbrowser-overlay).
							
							
							
						
@@ -1,6 +1,6 @@
```
 _____ _____ _____ ___
|     |  _  |_   _|_  |  Keep you data,
|     |  _  |_   _|_  |  Keep your data,
| | | |     | | | |  _|     trash your meta!
|_|_|_|__|__| |_| |___|
							
								
							
							
								
							
							
						
@@ -30,10 +30,11 @@ metadata.
- `python3-mutagen` for audio support
- `python3-gi-cairo` and `gir1.2-poppler-0.18` for PDF support
- `gir1.2-gdkpixbuf-2.0` for images support
- `FFmpeg`, optionally, for video support 
- `libimage-exiftool-perl` for everything else
Please note that MAT2 requires at least Python3.5, meaning that it
doesn't run on [Debian Jessie](https://packages.debian.org/jessie/python3),
doesn't run on [Debian Jessie](https://packages.debian.org/jessie/python3).
# Running the test suite
							
							
							
								
							
						
@@ -41,40 +42,79 @@ doesn't run on [Debian Jessie](https://packages.debian.org/jessie/python3),
$ python3 -m unittest discover -v
```
And if you want to see the coverage:
```bash
$ python3-coverage run --branch -m unittest discover -s tests/
$ python3-coverage report --include -m --include /libmat2/*'
```
# How to use MAT2
```bash
usage: mat2 [-h] [-v] [-l] [-c | -s | -L] [files [files ...]]
usage: mat2 [-h] [-v] [-l] [--check-dependencies] [-V]
            [--unknown-members policy] [-s | -L]
            [files [files ...]]
Metadata anonymisation toolkit 2
positional arguments:
  files
  files                 the files to process
optional arguments:
  -h, --help         show this help message and exit
  -v, --version      show program's version number and exit
  -l, --list         list all supported fileformats
  -c, --check        check if a file is free of harmful metadatas
  -s, --show         list all the harmful metadata of a file without removing
                     them
  -L, --lightweight  remove SOME metadata
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -l, --list            list all supported fileformats
  --check-dependencies  check if MAT2 has all the dependencies it needs
  -V, --verbose         show more verbose status information
  --unknown-members policy
                        how to handle unknown members of archive-style files
                        (policy should be one of: abort, omit, keep)
  -s, --show            list harmful metadata detectable by MAT2 without
                        removing them
  -L, --lightweight     remove SOME metadata
```
Note that MAT2 **will not** clean files in-place, but will produce, for
example, with a file named "myfile.png" a cleaned version named
"myfile.cleaned.png".
# Notes about detecting metadata
While MAT2 is doing its very best to display metadata when the `--show` flag is
passed, it doesn't mean that a file is clean from any metadata if MAT2 doesn't
show any. There is no reliable way to detect every single possible metadata for
complex file formats.
This is why you shouldn't rely on metadata's presence to decide if your file must
be cleaned or not.
# Notes about the lightweight mode
By default, mat2 might alter a bit the data of your files, in order to remove
as much metadata as possible. For example, texts in PDF might not be selectable anymore,
compressed images might get compressed again, …
Since some users might be willing to trade some metadata's presence in exchange
of the guarantee that mat2 won't modify the data of their files, there is the
`-L` flag that precisely does that.
# Related software
- The first iteration of [MAT](http://mat.boum.org)
- The first iteration of [MAT](https://mat.boum.org)
- [Exiftool](https://sno.phy.queensu.ca/~phil/exiftool/mat)
- [pdf-redact-tools](https://github.com/firstlookmedia/pdf-redact-tools), that
	tries to deal with *printer dots* too.
- [pdfparanoia](https://github.com/kanzure/pdfparanoia), that removes
	watermarks from PDF.
- [Scrambled Exif](https://f-droid.org/packages/com.jarsilio.android.scrambledeggsif/),
	an open-source Android application to remove metadata from pictures.
# Contact
If possible, use the [issues system](https://0xacab.org/jvoisin/mat2/issues).
If you think that a more private contact is needed (eg. for reporting security issues),
you can email Julien (jvoisin) Voisin at `julien.voisin+mat@dustri.org`,
If possible, use the [issues system](https://0xacab.org/jvoisin/mat2/issues)
or the [mailing list](https://mailman.boum.org/listinfo/mat-dev)
Should a more private contact be needed (eg. for reporting security issues),
you can email Julien (jvoisin) Voisin at `julien.voisin+mat2@dustri.org`,
using the gpg key `9FCDEE9E1A381F311EA62A7404D041E8171901CC`.
# License
							
							
							
								
							
						
@@ -93,6 +133,7 @@ You should have received a copy of the GNU Lesser General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.
Copyright 2018 Julien (jvoisin) Voisin <julien.voisin+mat2@dustri.org>
Copyright 2016 Marie Rose for MAT2's logo
# Thanks
							
								
							
							
							
						
 
			
				
					Side by Side
					
					Swipe
					Overlay
					
				
			
			
				
					
						
						
							Before
							
							
								
									Width: 
									 | 
									Height: 
									 | 
								
								Size: 3.1 KiB
							
						
						
						
						
							After
							
							
								
									Width: 
									 | 
									Height: 
									 | 
								
								Size: 28 KiB
							
						
						
					
				
				
				
					
						
							
							
								
							
							
								
								
							
						
					
				
				
					
						
						
							
							
						
					
				
				
			
		
							
							
							
						
@@ -1,27 +1,630 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg xmlns="http://www.w3.org/2000/svg" version="1.0">
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:cc="http://creativecommons.org/ns#"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:svg="http://www.w3.org/2000/svg"
   xmlns="http://www.w3.org/2000/svg"
   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
   inkscape:export-ydpi="384"
   inkscape:export-xdpi="384"
   inkscape:export-filename="mat2.png"
   width="128"
   height="128"
   id="svg11300"
   sodipodi:version="0.32"
   inkscape:version="0.92.2 2405546, 2018-03-11"
   sodipodi:docname="mat2.svg"
   inkscape:output_extension="org.inkscape.output.svg.inkscape"
   version="1.0"
   style="display:inline;enable-background:new"
   viewBox="0 0 128 128">
  <title
     id="title4162">Adwaita Icon Template</title>
  <defs
     id="defs3" />
  <sodipodi:namedview
     stroke="#ef2929"
     fill="#f57900"
     id="base"
     pagecolor="#ffffff"
     bordercolor="#666666"
     borderopacity="0.25490196"
     inkscape:pageopacity="0.0"
     inkscape:pageshadow="2"
     inkscape:zoom="4.0446508"
     inkscape:cx="99.116732"
     inkscape:cy="42.537095"
     inkscape:current-layer="layer1"
     showgrid="true"
     inkscape:grid-bbox="true"
     inkscape:document-units="px"
     inkscape:showpageshadow="false"
     inkscape:window-width="1920"
     inkscape:window-height="1021"
     inkscape:window-x="0"
     inkscape:window-y="22"
     width="400px"
     height="300px"
     inkscape:snap-nodes="true"
     inkscape:snap-bbox="false"
     objecttolerance="7"
     gridtolerance="12"
     guidetolerance="13"
     inkscape:window-maximized="1"
     inkscape:pagecheckerboard="false"
     showguides="true"
     inkscape:guide-bbox="true"
     inkscape:locked="false"
     inkscape:measure-start="0,0"
     inkscape:measure-end="0,0"
     inkscape:object-nodes="true"
     inkscape:bbox-nodes="true"
     inkscape:snap-global="true"
     inkscape:object-paths="true"
     inkscape:snap-intersection-paths="true"
     inkscape:snap-bbox-edge-midpoints="true"
     inkscape:snap-bbox-midpoints="true"
     showborder="false"
     inkscape:snap-center="true"
     inkscape:snap-object-midpoints="true"
     inkscape:snap-midpoints="true"
     inkscape:snap-smooth-nodes="true">
    <inkscape:grid
       type="xygrid"
       id="grid5883"
       spacingx="2"
       spacingy="2"
       enabled="true"
       visible="true"
       empspacing="4"
       originx="0"
       originy="0" />
    <sodipodi:guide
       position="64,8"
       orientation="0,1"
       id="guide1073"
       inkscape:locked="false"
       inkscape:label=""
       inkscape:color="rgb(0,0,255)" />
    <sodipodi:guide
       position="12,64"
       orientation="1,0"
       id="guide1075"
       inkscape:locked="false"
       inkscape:label=""
       inkscape:color="rgb(0,0,255)" />
    <sodipodi:guide
       position="64,104"
       orientation="0,1"
       id="guide1099"
       inkscape:locked="false"
       inkscape:label=""
       inkscape:color="rgb(0,0,255)" />
    <sodipodi:guide
       position="64,128"
       orientation="0,1"
       id="guide993"
       inkscape:locked="false"
       inkscape:label=""
       inkscape:color="rgb(0,0,255)" />
    <sodipodi:guide
       position="104,64"
       orientation="1,0"
       id="guide995"
       inkscape:locked="false"
       inkscape:label=""
       inkscape:color="rgb(0,0,255)" />
    <sodipodi:guide
       position="9.2651362e-08,64"
       orientation="1,0"
       id="guide867"
       inkscape:locked="false"
       inkscape:label=""
       inkscape:color="rgb(0,0,255)" />
    <sodipodi:guide
       position="120,64"
       orientation="1,0"
       id="guide869"
       inkscape:locked="false"
       inkscape:label=""
       inkscape:color="rgb(0,0,255)" />
    <sodipodi:guide
       position="64,116"
       orientation="0,1"
       id="guide871"
       inkscape:locked="false"
       inkscape:label=""
       inkscape:color="rgb(0,0,255)" />
    <inkscape:grid
       type="xygrid"
       id="grid873"
       spacingx="1"
       spacingy="1"
       empspacing="8"
       color="#000000"
       opacity="0.49019608"
       empcolor="#000000"
       empopacity="0.08627451"
       dotted="true" />
    <sodipodi:guide
       position="24,64"
       orientation="1,0"
       id="guide877"
       inkscape:locked="false"
       inkscape:label=""
       inkscape:color="rgb(0,0,255)" />
    <sodipodi:guide
       position="116,64"
       orientation="1,0"
       id="guide879"
       inkscape:locked="false"
       inkscape:label=""
       inkscape:color="rgb(0,0,255)" />
    <sodipodi:guide
       position="64,120"
       orientation="0,1"
       id="guide881"
       inkscape:locked="false"
       inkscape:label=""
       inkscape:color="rgb(0,0,255)" />
    <sodipodi:guide
       position="64,12"
       orientation="0,1"
       id="guide883"
       inkscape:locked="false"
       inkscape:label=""
       inkscape:color="rgb(0,0,255)" />
    <sodipodi:guide
       position="8,64"
       orientation="1,0"
       id="guide885"
       inkscape:locked="false"
       inkscape:label=""
       inkscape:color="rgb(0,0,255)" />
    <sodipodi:guide
       position="128,64"
       orientation="1,0"
       id="guide887"
       inkscape:locked="false"
       inkscape:label=""
       inkscape:color="rgb(0,0,255)" />
    <sodipodi:guide
       position="64,0"
       orientation="0,1"
       id="guide897"
       inkscape:locked="false"
       inkscape:label=""
       inkscape:color="rgb(0,0,255)" />
    <sodipodi:guide
       position="64,24"
       orientation="0,1"
       id="guide899"
       inkscape:locked="false"
       inkscape:label=""
       inkscape:color="rgb(0,0,255)" />
    <sodipodi:guide
       position="256,256"
       orientation="-0.70710678,0.70710678"
       id="guide950"
       inkscape:locked="false"
       inkscape:label=""
       inkscape:color="rgb(0,0,255)" />
    <sodipodi:guide
       position="64,64"
       orientation="0.70710678,0.70710678"
       id="guide952"
       inkscape:locked="false"
       inkscape:label=""
       inkscape:color="rgb(0,0,255)" />
  </sodipodi:namedview>
  <metadata
     id="metadata4">
    <rdf:RDF>
      <cc:Work
         rdf:about="">
        <dc:format>image/svg+xml</dc:format>
        <dc:type
           rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
        <dc:creator>
          <cc:Agent>
            <dc:title>GNOME Design Team</dc:title>
          </cc:Agent>
        </dc:creator>
        <dc:source />
        <cc:license
           rdf:resource="http://creativecommons.org/licenses/by-sa/4.0/" />
        <dc:title>Adwaita Icon Template</dc:title>
        <dc:subject>
          <rdf:Bag />
        </dc:subject>
        <dc:date />
        <dc:rights>
          <cc:Agent>
            <dc:title />
          </cc:Agent>
        </dc:rights>
        <dc:publisher>
          <cc:Agent>
            <dc:title />
          </cc:Agent>
        </dc:publisher>
        <dc:identifier />
        <dc:relation />
        <dc:language />
        <dc:coverage />
        <dc:description />
        <dc:contributor>
          <cc:Agent>
            <dc:title />
          </cc:Agent>
        </dc:contributor>
      </cc:Work>
      <cc:License
         rdf:about="http://creativecommons.org/licenses/by-sa/4.0/">
        <cc:permits
           rdf:resource="http://creativecommons.org/ns#Reproduction" />
        <cc:permits
           rdf:resource="http://creativecommons.org/ns#Distribution" />
        <cc:requires
           rdf:resource="http://creativecommons.org/ns#Notice" />
        <cc:requires
           rdf:resource="http://creativecommons.org/ns#Attribution" />
        <cc:permits
           rdf:resource="http://creativecommons.org/ns#DerivativeWorks" />
        <cc:requires
           rdf:resource="http://creativecommons.org/ns#ShareAlike" />
      </cc:License>
    </rdf:RDF>
  </metadata>
  <g
     fill="#27628a"
     stroke="none">
    <path
       d="M0 5120 l0 -5120 3000 0 3000 0 0 5120 0 5120 -3000 0 -3000 0 0 -5120z" />
  </g>
  <g
     fill="#7fcae7"
     stroke="none">
    <path
       d="M 0,5120 V 0 h 3000 3000 v 5120 5120 H 3000 0 Z m 3041,3965 c 257.1951,-231.2173 270.8768,-244.4494 1132,-978 100.0843,-559.7796 173.9788,-986.5359 279,-1586 -165.7863,-405.0485 -178.8353,-430.8722 -292,-721 650.6072,-1421.1218 667.3936,-1452.2872 1190,-2550 -2109.4504,-0.035 -2130.9695,-0.025 -4468.86586,0.037 72.33788,69.7996 74.76441,71.6861 148.86586,140.963 -129.0483,91.5488 -134.68166,93.6858 -367,225 175.86245,383.2532 323.97381,668.4741 527,1073 35.6121,292.0899 72.3384,584.0406 109,876 5.074,391.6586 9.0034,783.3294 13,1175 314.3202,597.9247 654.4179,1182.5892 964,1783 88.7542,312.5107 121.9361,512.8332 194,862 95.2778,168.6736 102.3771,181.1881 273,473 113.1881,-286.567 245.9452,-613.0146 298,-773 z" />
  </g>
  <g
     fill="#c0dede"
     stroke="none">
    <path
       d="M0 1625 l0 -1625 3000 0 3000 0 0 1625 0 1625 -3000 0 -3000 0 0 -1625z" />
  </g>
  <g
     fill="#ffffff"
     stroke="none">
    <path
       d="M 881.01695,3249.9206 C 1286.0459,3091.4742 1546.5278,3035.4925 1889,2924 c 129.95,-482.4131 173.4726,-686.2614 331,-1262 132.796,95.3371 216.2935,142.9991 359,242 116.2556,-360.389 199.5642,-636.2515 320,-1025 108.0281,-100.84978 136.3812,-131.67871 296,-299 10,0 254,309 487,616 83.6789,470.193 92.832,516.3155 215,1032 422.9371,260.0129 459.4089,278.2641 878,528 0,69.3333 0,138.6667 0,208 253.7343,134.9322 263.2776,139.2776 570,286 H 3107 c -2226,0 -2219.49894,-0.1145 -2225.98305,-0.079 z" />
     id="layer1"
     inkscape:label="Icon"
     inkscape:groupmode="layer"
     style="display:inline"
     transform="translate(0,-172)">
    <g
       inkscape:groupmode="layer"
       id="layer2"
       inkscape:label="baseplate"
       style="display:none">
      <text
         xml:space="preserve"
         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:5.33333349px;line-height:125%;font-family:Cantarell;-inkscape-font-specification:'Cantarell, Normal';text-align:start;writing-mode:lr-tb;text-anchor:start;display:inline;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.33264872;enable-background:new"
         x="7.9499588"
         y="148.65199"
         id="context"
         inkscape:label="context"><tspan
           sodipodi:role="line"
           id="tspan2716"
           x="7.9499588"
           y="148.65199"
           style="font-size:5.33333349px;stroke-width:0.33264872">apps</tspan></text>
      <text
         inkscape:label="icon-name"
         id="text3021"
         y="157.23398"
         x="7.7533054"
         style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:5.33333349px;line-height:125%;font-family:Cantarell;-inkscape-font-specification:'Cantarell, Bold';text-align:start;writing-mode:lr-tb;text-anchor:start;display:inline;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.33264872;enable-background:new"
         xml:space="preserve"><tspan
           y="157.23398"
           x="7.7533054"
           id="tspan3023"
           sodipodi:role="line"
           style="font-size:5.33333349px;stroke-width:0.33264872">org.gnome.</tspan></text>
      <g
         style="display:inline;fill:#000000;enable-background:new"
         transform="matrix(7.9911709,0,0,8.0036407,-167.7909,-4846.0776)"
         id="g12027"
         inkscape:export-xdpi="12"
         inkscape:export-ydpi="12" />
      <rect
         style="display:inline;overflow:visible;visibility:visible;fill:#f0f0f0;fill-opacity:1;fill-rule:nonzero;stroke:none;stroke-width:0.5;marker:none;enable-background:accumulate"
         id="rect13805"
         width="128"
         height="128"
         x="9.2651362e-08"
         y="172"
         inkscape:label="512x512" />
      <g
         id="g883"
         style="fill:none;fill-opacity:0.25098039;stroke:#a579b3;stroke-opacity:1"
         transform="translate(-24,24)" />
      <g
         id="g900"
         style="fill:none;fill-opacity:0.25098039;stroke:#a579b3;stroke-opacity:1"
         transform="translate(-24,24)" />
      <g
         id="g1168"
         transform="matrix(0.25,0,0,0.25,6.9488522e-8,225)">
        <circle
           cx="256"
           cy="44"
           r="240"
           id="path1142"
           style="opacity:0.1;fill:#2864b0;fill-opacity:1;stroke:none;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;marker:none;marker-start:none;marker-mid:none;marker-end:none;paint-order:normal" />
        <rect
           ry="32"
           rx="32"
           y="-180"
           x="96"
           height="448"
           width="319.99979"
           id="rect1110"
           style="opacity:0.1;fill:#2864b0;fill-opacity:1;stroke:none;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;marker:none;marker-start:none;marker-mid:none;marker-end:none;paint-order:normal" />
        <rect
           ry="32"
           rx="32"
           y="-164"
           x="48"
           height="416"
           width="416"
           id="rect1110-8"
           style="display:inline;opacity:0.1;fill:#2864b0;fill-opacity:1;stroke:none;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;marker:none;marker-start:none;marker-mid:none;marker-end:none;paint-order:normal;enable-background:new" />
        <rect
           ry="32"
           rx="32"
           y="-116"
           x="32"
           height="320"
           width="448"
           id="rect1110-8-9"
           style="display:inline;opacity:0.1;fill:#2864b0;fill-opacity:1;stroke:none;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;marker:none;marker-start:none;marker-mid:none;marker-end:none;paint-order:normal;enable-background:new" />
      </g>
    </g>
    <g
       inkscape:groupmode="layer"
       id="layer9"
       inkscape:label="hires"
       style="display:none" />
    <g
       id="g944"
       transform="matrix(1,0,0,0.93868822,0,14.545966)">
      <path
         style="fill:#99c1f1;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.41013032;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
         d="m 50.53899,195.25817 6.396029,-11.43484 1.082405,-0.87215 4.821622,-10.46578 0.885604,-0.38763 2.558412,4.74837 2.755213,9.59364 1.672808,1.35667 3.542417,-0.87215 5.707227,12.59771 12.988859,9.59364 3.050415,3.87621 v 2.71335 l -16.334476,-1.25977 -7.084833,1.45359 -4.428021,-0.38763 -7.084833,0.29072 -11.414452,-0.58143 -3.640817,0.96905 -9.052843,-1.64739 -2.066409,0.0969 -1.476008,-0.48452 1.377607,-1.45358 1.869609,-1.06596 6.002428,-11.04722 1.279206,0.48453 5.412025,-6.49267 z"
         id="path3455"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#241f31;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 49.358184,215.31759 -3.444016,0.9206 -9.003641,-1.74429 -1.918809,0.24226 -1.623608,-0.58143 1.574407,-1.50204 1.722008,-0.96905 5.953228,-11.09567 1.279205,0.53298 5.510426,-6.54112 0.344401,0.29072 -4.969223,10.27197 2.214011,1.93811 -0.246001,4.45765 z"
         id="path3459"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#241f31;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 50.438601,195.22912 6.470906,-11.5803 1.113274,-0.6167 4.870575,-10.62099 0.904535,-0.41113 -0.417479,3.3576 0.626218,0.89079 0.834954,15.89722 1.391594,3.70021 -3.687722,5.34476 0.208739,1.37044 -0.347898,5.68737 1.87865,3.28908 7.375442,2.19272 1.252433,2.19272 -0.487057,0.13704 -4.244358,-0.54818 -6.540486,0.41114 -2.435287,-2.19272 -0.626216,-4.24839 -2.087389,-6.16703 -4.035619,-3.42612 -2.087388,-4.38544"
         id="path3461"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 32.069579,218.11563 c -0.06958,-0.27409 0.695796,-1.23341 0.695796,-1.23341 l 2.783185,-0.0685 1.739491,2.26124 4.661836,5.13919 0.139158,1.57602 -4.174778,5.96145 -0.487057,6.16703 -2.922344,2.26124 -0.06958,1.57601 h -1.113274 l -1.322013,-3.08351 2.017809,-14.86938 z"
         id="path3400"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 48.83827,222.43255 1.600331,-3.01499 -0.695796,-0.75375 -5.635951,-1.16488 -3.200663,0.82227 -0.06958,1.50749 1.53075,0.75375 1.461174,2.67237 -0.208739,1.71307 1.739489,1.02783 2.296129,-0.54818 z"
         id="path3402"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 51.203977,217.70449 1.113274,-0.68522 2.365707,1.02784 1.322013,2.67237 -2.226548,2.26125 -1.322013,-0.82227 -1.322013,-0.61671 0.834956,-1.71306 z"
         id="path3404"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 43.758957,226.61242 1.948228,0.68522 0.417479,1.91863 -0.626216,1.30193 -1.182854,0.34261 -1.113275,1.02784 -0.765376,3.63169 0.626218,3.01499 -1.252435,0.68522 -0.487057,-0.41113 -0.278319,-1.5075 -1.80907,-1.37045 -0.765376,-3.49464 3.618141,-3.42613 1.669912,-2.67237"
         id="path3406"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 50.57776,223.25481 0.13916,0.68523 -2.783187,3.83726 0.06958,1.64454 -0.626218,1.50749 -1.60033,1.43897 -0.06958,0.75375 1.600333,1.91863 1.182854,3.08351 0.974114,0.68523 1.669911,-2.80942 -0.278318,-3.22056 3.966039,-3.3576 0.695796,-1.09636 -3.270243,-4.45396 z"
         id="path3408"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 51.839954,236.39856 -0.834826,1.58948 0.166966,1.26061 1.057445,1.97315 0.500896,-0.32886 0.389584,-1.7539 1.447031,-1.151 2.337512,-4.0559 -0.22262,-1.04138 -1.947927,-1.69909 -2.114892,1.31542 0.278276,3.39819 z"
         id="path3410"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 57.593778,229.84236 -1.043694,1.09636 0.765375,0.89079 1.043695,-0.20556 v -1.43898 z"
         id="path3412"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 59.344793,218.25267 -0.765376,2.19272 -0.695796,0.27409 -0.695796,1.91863 -2.226548,2.26124 2.574446,3.56317 h 1.182854 l 0.487057,0.75375 0.626217,1.09636 1.948229,1.30193 2.922346,-0.6167 1.53075,-2.26125 -1.043694,-3.3576 -1.043693,-1.64454 1.322011,-2.60385 -0.904535,-1.37045 -2.226548,0.0685 z"
         id="path3416"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 72.150522,238.17554 -0.518261,1.78635 1.036524,2.16915 1.684349,-2.04155 -0.647826,-2.16915 z"
         id="path3418"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 66.789813,223.66595 1.600333,-0.75375 1.739489,-4.11135 2.922346,0.75375 1.322013,0.41114 0.139159,6.7152 -1.461172,1.02784 -2.226548,4.17987 -0.834956,-0.41114 -0.626216,0.95932 -2.574448,-0.61671 0.904537,-3.08351 z"
         id="path3422"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 77.505077,218.59529 1.182854,-0.20557 2.435287,1.30193 -0.974115,1.02783 -2.087389,3.63169 -1.391593,0.0685 -1.113274,-0.61671 1.043695,-2.19271 z"
         id="path3426"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 73.539038,231.06638 1.043695,-1.30193 1.043694,-2.80942 4.522676,1.71306 -0.974115,2.87795 -1.94823,-0.41114 -1.80907,1.09636 z"
         id="path3428"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 78.200873,225.6531 7.932079,-7.94861 3.339822,1.09636 0.974115,0.13705 1.600331,-1.02784 3.339822,0.0685 -5.079314,12.81371 -3.200663,-1.98715 0.139161,-1.16489 -0.695798,-0.6167 -0.208737,-1.16488 -1.043696,0.27409 -3.200663,2.39829 z"
         id="path3430"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 81.401536,230.99786 c 0,-0.2741 2.156968,-1.98716 2.156968,-1.98716 l 2.017811,1.30193 -0.904535,2.32976 -1.182855,0.75375 z"
         id="path3432"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 81.679855,237.8501 0.765375,-1.91863 0.208739,-1.2334 2.156969,0.20557 2.156968,-2.87795 3.409403,1.02784 -0.904535,2.80942 -0.904535,0.34261 -0.626218,2.80943 1.043694,4.72805 -0.904535,1.09636 -1.80907,-2.19272 -0.626217,-1.37045 z"
         id="path3434"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 78.131294,238.60385 0.626216,3.08351 -0.626216,3.22056 0.765375,0.95931 -0.626216,5.68737 2.504866,2.32976 1.87865,-0.47965 0.417478,-3.35761 1.669911,-0.0685 3.757301,-1.8501 -0.20874,-1.98716 -2.226548,-0.20556 -1.182854,-3.01499 -3.200662,-2.05568 -1.252434,-2.39828 z"
         id="path3436"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 84.532619,251.41755 -0.278318,1.43898 -0.695797,0.6167 1.322013,2.67238 2.365709,-0.20557 1.53075,-2.94647 -2.365707,-1.98715 z"
         id="path3438"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 64.993183,249.51154 -1.14959,2.51583 0.766392,1.69818 2.618509,0.25159 0.702526,1.19502 1.021857,2.39003 -0.574794,2.32714 3.89583,1.88688 0.95799,-1.06923 0.510928,-4.59139 -4.023561,-2.70451 -0.127732,-4.21402 z"
         id="path3440"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 72.912822,251.00642 h 1.391592 l 2.574446,0.75375 1.391593,1.98715 1.461172,1.30193 -0.139159,3.42612 -3.409402,1.57602 -0.974115,-1.85011 0.626217,-3.3576 -3.270243,-1.85011 z"
         id="path3442"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 72.147446,264.77944 1.80907,-1.98715 3.339822,-1.85011 1.322013,-0.0685 4.661835,-3.63169 1.391594,0.34261 0.556637,4.52248 -3.200664,4.04283 -2.852765,-0.82227 -1.80907,0.54818 -0.765376,1.43897 -2.087389,0.68522 z"
         id="path3444"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 75.06979,272.93361 0.765376,-1.30192 1.252433,-0.41114 0.904535,-2.87794 1.94823,-0.61671 0.556637,2.60386 -3.339822,6.0985 -1.391593,-0.0685 z"
         id="path3446"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 71.451649,268.20556 -1.252433,1.85011 2.504867,1.98715 0.765376,0.82227 1.73949,-2.39829 -2.296127,-2.80942 -1.461173,0.27409 z"
         id="path3448"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 62.24531,254.0948 1.461172,1.02784 1.948229,0.54818 0.487058,1.64454 -1.461173,2.67237 -0.06958,1.78159 -1.669911,1.85011 -1.252433,-2.05568 0.487057,-2.80942 -1.391593,-0.34261 -0.904535,-2.80942 z"
         id="path3450"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 47.585836,246.55246 -0.695796,3.70021 -0.139159,1.37045 1.87865,0.68523 1.391592,0.95931 1.809071,-1.64454 -0.417478,-0.95931 z"
         id="path3452"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 54.682958,247.78586 -1.043694,1.02784 0.208739,1.98715 1.600331,0.89079 0.626217,-0.47965 0.06958,-2.26125 z"
         id="path3454"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 48.629531,258.95503 4.800994,-6.16703 3.409402,0.82227 0.556637,1.78159 3.131083,4.79657 -1.669911,5.82441 -3.200663,-1.37045 -0.417478,-3.49464 -2.087388,1.30192 z"
         id="path3456"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 45.915924,252.71948 -0.487056,1.98715 1.60033,1.57602 1.461174,-0.20557 -0.347899,-2.19272 z"
         id="path3458"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 67.555189,261.6274 -1.80907,2.80943 -2.435287,8.42826 2.783185,3.76874 1.461172,-0.0685 1.113274,-2.12419 1.043696,-0.20557 0.487057,-1.09636 -1.043694,-4.45396 1.182853,-4.31692 z"
         id="path3460"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 58.718577,267.79443 1.600331,-1.23341 2.017809,1.71306 -0.904535,1.85011 z"
         id="path3462"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 58.509838,276.49678 2.156968,-4.591 1.391593,-0.27409 0.834955,1.50749 -2.017809,5.13919 z"
         id="path3464"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 71.242911,274.02997 1.391592,0.20557 1.043694,3.01499 2.01781,0.68522 1.530751,1.57602 -0.904535,2.87795 -2.365707,2.32976 -0.139159,3.56317 -1.322013,1.98715 -2.504867,-1.85011 -0.278318,-2.67237 -1.530752,-1.78159 -1.113274,-3.08351 3.61814,-4.17987 z"
         id="path3466"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 62.893354,276.5653 3.270244,1.16489 0.06958,3.70021 -0.556637,0.68523 0.974115,3.70021 1.252433,1.64454 0.06958,3.08351 -2.017809,1.37045 -2.574447,8.08566 -2.574447,-1.30193 -1.948229,-9.79872 z"
         id="path3468"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 58.440258,283.5546 h 0.556637 l 0.417478,0.95931 -0.208739,1.30193 -1.461172,0.13704 z"
         id="path3472"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 56.700767,279.16916 -1.113274,0.95931 0.834956,2.80943 1.600331,0.20556 0.487058,-2.05567 -0.695796,-1.91863 z"
         id="path3474"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 53.152207,272.17987 0.139159,5.13918 1.87865,1.23341 0.834955,-0.54818 0.904535,-3.63169 1.530752,-1.57602 -1.669911,-3.97431 -3.548561,3.08352 z"
         id="path3476"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 45.915924,258.33832 -0.208739,3.83726 -4.731414,3.97431 1.948229,2.80942 8.488716,0.82227 0.417478,1.98715 1.043694,-0.75375 0.487057,-2.19272 1.182854,-1.64454 -0.417478,-1.09635 -1.87865,-2.60386 -3.757299,-1.37045 -1.461174,-3.22056 z"
         id="path3480"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 40.279975,263.68308 1.669912,0.6167 3.061502,-6.37259 -0.904535,-5.61884 -2.504867,-0.34262 -1.391592,-1.2334 2.156968,-7.606 -2.087388,-4.45396 -3.409402,1.57602 -0.834956,3.42612 -1.87865,0.20557 -0.347898,2.1242 1.530752,1.64454 h 1.322013 l 0.626217,3.90578 2.296127,5.61884 -0.347898,2.19272 z"
         id="path3482"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 66.531337,247.61066 -0.590018,-0.31657 -0.420783,-1.71262 0.427793,-0.66945 1.306823,-1.13114 2.316342,-1.38746 1.06612,0.23465 -0.01701,2.21105 -2.36166,3.35302 z"
         id="path4284"
         inkscape:connector-curvature="0"
         inkscape:transform-center-x="4.9927099"
         inkscape:transform-center-y="-9.3161687" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 72.373733,232.22199 -0.815102,1.03206 4.017286,4.12827 1.571981,0.17201 1.339096,-0.86006 0.931544,0.63071 2.387083,-2.98152 -2.794634,-0.91739 -3.027519,0.22934 z"
         id="path3601"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 57.407878,237.1102 -1.301737,2.34289 -1.301738,0.61888 -0.17955,1.45878 -4.488748,1.54719 -0.403989,1.50299 0.314213,0.30944 1.032412,0.0884 v 1.41457 l 1.660839,1.50299 2.154598,-1.94504 1.571064,0.35364 2.738136,-1.94504 -1.436399,-2.56392 0.987525,-3.44803 -0.583538,-1.37037 z"
         id="path3603"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 62.104217,246.96003 5.843936,-6.55723 0.659867,-2.66044 2.221783,-0.40757 -0.386451,-3.39556 -2.000988,-0.60704 -6.246127,-0.36572 -2.624948,2.5137 1.519708,2.75102 -0.347742,5.51876 z"
         id="path3605"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 71.024647,249.63275 5.822153,1.31875 1.047988,-3.89891 -1.280874,-1.43343 0.523995,-6.02038 -3.551515,5.275 0.34933,2.06413 -2.037753,0.80272 -1.164431,0.45869 z"
         id="path3607"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 59.099222,247.24427 -2.095974,1.72011 -0.05822,1.60543 0.465772,1.72011 1.455539,0.97473 -0.407551,0.97473 2.328861,-0.34402 2.27064,-2.86685 -1.571981,-0.57337 -0.640437,-2.86685 -1.51376,-0.40136 z"
         id="path3609"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 44.078067,234.34346 0.291107,4.47228 -1.863089,1.43342 2.095976,3.72691 2.037753,0.0573 2.27064,-3.55489 -2.969297,-4.98831 z"
         id="path3611"
         inkscape:connector-curvature="0" />
      <path
         style="fill:#1a5fb4;fill-opacity:1;fill-rule:evenodd;stroke:none;stroke-width:0.13671011px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
         d="m 44.660282,245.46683 -3.318627,4.30027 1.339096,1.26141 2.561747,-0.28668 1.222652,-3.15354 z"
         id="path3613"
         inkscape:connector-curvature="0" />
    </g>
  </g>
</svg>
							
							
							
						
 
			
				
					Side by Side
					
					Swipe
					Overlay
					
				
			
			
				
					
						
						
							Before
							
							
								
									Width: 
									 | 
									Height: 
									 | 
								
								Size: 1.7 KiB
							
						
						
						
						
							After
							
							
								
									Width: 
									 | 
									Height: 
									 | 
								
								Size: 34 KiB
							
						
						
					
				
				
				
					
						
							
							
								
							
							
								
								
							
						
					
				
				
					
						
						
							
							
						
					
				
				
			
		
							
							
								
							
							
						
@@ -6,7 +6,7 @@ Lightweight cleaning mode
Due to *popular* request, MAT2 is providing a *lightweight* cleaning mode,
that only cleans the superficial metadata of your file, but not
the ones that might be in **embeded** resources. Like for example,
the ones that might be in **embedded** resources. Like for example,
images in a PDF or an office document.
Revisions handling
							
								
							
							
								
							
							
						
@@ -61,3 +61,11 @@ Images handling
When possible, images are handled like PDF: rendered on a surface, then saved
to the filesystem. This ensures that every metadata is removed.
XML attacks
-----------
Since our threat model conveniently excludes files crafted to specifically
bypass MAT2, fileformats containing harmful XML are out of our scope.
But since MAT2 is using [etree](https://docs.python.org/3/library/xml.html#xml-vulnerabilities)
to process XML, it's "only" vulnerable to DoS, and not memory corruption:
odds are that the user will notice that the cleaning didn't succeed.
							
							
							
						
 
							
							
							
						
@@ -0,0 +1,76 @@
.TH MAT2 "1" "February 2019" "MAT2 0.7.0" "User Commands"
.SH NAME
mat2 \- the metadata anonymisation toolkit 2
.SH SYNOPSIS
\fBmat2\fR [\-h] [\-v] [\-l] [\-V] [-s | -L] [\fIfiles\fR [\fIfiles ...\fR]]
.SH DESCRIPTION
.B mat2
removes metadata from various fileformats. It supports a wide variety of file 
formats, audio, office, images, …
Careful, mat2 does not clean files in-place, instead, it will produce a file with the word
"cleaned" between the filename and its extension, for example "filename.cleaned.png"
for a file named "filename.png".
.SH OPTIONS
.SS "positional arguments:"
.TP
\fBfiles\fR
the files to process
.SS "optional arguments:"
.TP
\fB\-h\fR, \fB\-\-help\fR
show this help message and exit
.TP
\fB\-v\fR, \fB\-\-version\fR
show program's version number and exit
.TP
\fB\-l\fR, \fB\-\-list\fR
list all supported fileformats
.TP
\fB\-\-check\-dependencies\fR
check if MAT2 has all the dependencies it needs
.TP
\fB\-V\fR, \fB\-\-verbose\fR
show more verbose status information
.TP
\fB\-\-unknown-members\fR \fIpolicy\fR
how to handle unknown members of archive-style files (policy should be one of: abort, omit, keep)
.TP
\fB\-s\fR, \fB\-\-show\fR
list harmful metadata detectable by MAT2 without
removing them
.TP
\fB\-L\fR, \fB\-\-lightweight\fR
remove SOME metadata
.SH EXAMPLES
To remove all the metadata from a PDF file:
.PP
.nf
.RS
mat2 ./myfile.pdf
.RE
.fi
.PP
.SH BUGS
While mat2 does its very best to remove every single metadata,
it's still in beta, and \fBsome\fR might remain. Should you encounter
some issues, check the bugtracker: https://0xacab.org/jvoisin/mat2/issues
.PP
Please use accordingly and be careful.
.SH AUTHOR
This software was made by Julien (jvoisin) Voisin with the support of the Tails project.
.SH COPYRIGHT
This software is released on LGPLv3.
.SH "SEE ALSO"
.BR exiftool (1p)
.BR pdf-redact-tools (1)
							
							
							
						
@@ -1,7 +1,18 @@
#!/bin/env python3
#!/usr/bin/env python3
import collections
import enum
import importlib
from typing import Dict, Optional
from . import exiftool, video
# make pyflakes happy
assert Dict
assert Optional
# A set of extension that aren't supported, despite matching a supported mimetype
unsupported_extensions = {
UNSUPPORTED_EXTENSIONS = {
    '.asc',
    '.bat',
    '.brf',
							
							
							
								
							
						
@@ -17,3 +28,35 @@ unsupported_extensions = {
    '.xsd',
    '.xsl',
    }
DEPENDENCIES = {
    'cairo': 'Cairo',
    'gi': 'PyGobject',
    'gi.repository.GdkPixbuf': 'GdkPixbuf from PyGobject',
    'gi.repository.Poppler': 'Poppler from PyGobject',
    'gi.repository.GLib': 'GLib from PyGobject',
    'mutagen': 'Mutagen',
    }
def check_dependencies() -> Dict[str, bool]:
    ret = collections.defaultdict(bool)  # type: Dict[str, bool]
    ret['Exiftool'] = bool(exiftool._get_exiftool_path())
    ret['Ffmpeg'] = bool(video._get_ffmpeg_path())
    for key, value in DEPENDENCIES.items():
        ret[value] = True
        try:
            importlib.import_module(key)
        except ImportError:  # pragma: no cover
            ret[value] = False  # pragma: no cover
    return ret
@enum.unique
class UnknownMemberPolicy(enum.Enum):
    ABORT = 'abort'
    OMIT = 'omit'
    KEEP = 'keep'
							
							
							
						
 
							
							
							
						
@@ -1,27 +1,41 @@
import abc
import os
from typing import Set, Dict
import re
from typing import Set, Dict, Union
assert Set  # make pyflakes happy
class AbstractParser(abc.ABC):
    """ This is the base class of every parser.
    It might yield `ValueError` on instantiation on invalid files,
    and `RuntimeError` when something went wrong in `remove_all`.
    """
    meta_list = set()  # type: Set[str]
    mimetypes = set()  # type: Set[str]
    def __init__(self, filename: str) -> None:
        """
        :raises ValueError: Raised upon an invalid file
        """
        if re.search('^[a-z0-9./]', filename) is None:
            # Some parsers are calling external binaries,
            # this prevents shell command injections
            filename = os.path.join('.', filename)
        self.filename = filename
        fname, extension = os.path.splitext(filename)
        self.output_filename = fname + '.cleaned' + extension
        self.lightweight_cleaning = False
    @abc.abstractmethod
    def get_meta(self) -> Dict[str, str]:
    def get_meta(self) -> Dict[str, Union[str, dict]]:
        pass  # pragma: no cover
    @abc.abstractmethod
    def remove_all(self) -> bool:
        """
        :raises RuntimeError: Raised if the cleaning process went wrong.
        """
        # pylint: disable=unnecessary-pass
        pass  # pragma: no cover
    def remove_all_lightweight(self) -> bool:
        """ Remove _SOME_ metadata. """
        return self.remove_all()
							
							
							
						
 
							
							
							
						
@@ -0,0 +1,178 @@
import zipfile
import datetime
import tempfile
import os
import logging
import shutil
from typing import Dict, Set, Pattern, Union, Any
from . import abstract, UnknownMemberPolicy, parser_factory
# Make pyflakes happy
assert Set
assert Pattern
assert Union
class ArchiveBasedAbstractParser(abstract.AbstractParser):
    """ Office files (.docx, .odt, …) are zipped files. """
    def __init__(self, filename):
        super().__init__(filename)
        # Those are the files that have a format that _isn't_
        # supported by MAT2, but that we want to keep anyway.
        self.files_to_keep = set()  # type: Set[Pattern]
        # Those are the files that we _do not_ want to keep,
        # no matter if they are supported or not.
        self.files_to_omit = set()  # type: Set[Pattern]
        # what should the parser do if it encounters an unknown file in
        # the archive?
        self.unknown_member_policy = UnknownMemberPolicy.ABORT  # type: UnknownMemberPolicy
        try:  # better fail here than later
            zipfile.ZipFile(self.filename)
        except zipfile.BadZipFile:
            raise ValueError
    def _specific_cleanup(self, full_path: str) -> bool:
        """ This method can be used to apply specific treatment
        to files present in the archive."""
        # pylint: disable=unused-argument,no-self-use
        return True  # pragma: no cover
    def _specific_get_meta(self, full_path: str, file_path: str) -> Dict[str, Any]:
        """ This method can be used to extract specific metadata
        from files present in the archive."""
        # pylint: disable=unused-argument,no-self-use
        return {}  # pragma: no cover
    @staticmethod
    def _clean_zipinfo(zipinfo: zipfile.ZipInfo) -> zipfile.ZipInfo:
        zipinfo.create_system = 3  # Linux
        zipinfo.comment = b''
        zipinfo.date_time = (1980, 1, 1, 0, 0, 0)  # this is as early as a zipfile can be
        return zipinfo
    @staticmethod
    def _get_zipinfo_meta(zipinfo: zipfile.ZipInfo) -> Dict[str, str]:
        metadata = {}
        if zipinfo.create_system == 3:  # this is Linux
            pass
        elif zipinfo.create_system == 2:
            metadata['create_system'] = 'Windows'
        else:
            metadata['create_system'] = 'Weird'
        if zipinfo.comment:
            metadata['comment'] = zipinfo.comment  # type: ignore
        if zipinfo.date_time != (1980, 1, 1, 0, 0, 0):
            metadata['date_time'] = str(datetime.datetime(*zipinfo.date_time))
        return metadata
    def get_meta(self) -> Dict[str, Union[str, dict]]:
        meta = dict()  # type: Dict[str, Union[str, dict]]
        with zipfile.ZipFile(self.filename) as zin:
            temp_folder = tempfile.mkdtemp()
            for item in zin.infolist():
                local_meta = dict()  # type: Dict[str, Union[str, Dict]]
                for k, v in self._get_zipinfo_meta(item).items():
                    local_meta[k] = v
                if item.filename[-1] == '/':  # pragma: no cover
                    # `is_dir` is added in Python3.6
                    continue  # don't keep empty folders
                zin.extract(member=item, path=temp_folder)
                full_path = os.path.join(temp_folder, item.filename)
                specific_meta = self._specific_get_meta(full_path, item.filename)
                for (k, v) in specific_meta.items():
                    local_meta[k] = v
                tmp_parser, _ = parser_factory.get_parser(full_path)  # type: ignore
                if tmp_parser:
                    for k, v in tmp_parser.get_meta().items():
                        local_meta[k] = v
                if local_meta:
                    meta[item.filename] = local_meta
        shutil.rmtree(temp_folder)
        return meta
    def remove_all(self) -> bool:
        # pylint: disable=too-many-branches
        with zipfile.ZipFile(self.filename) as zin,\
             zipfile.ZipFile(self.output_filename, 'w') as zout:
            temp_folder = tempfile.mkdtemp()
            abort = False
            # Since files order is a fingerprint factor,
            # we're iterating (and thus inserting) them in lexicographic order.
            for item in sorted(zin.infolist(), key=lambda z: z.filename):
                if item.filename[-1] == '/':  # `is_dir` is added in Python3.6
                    continue  # don't keep empty folders
                zin.extract(member=item, path=temp_folder)
                full_path = os.path.join(temp_folder, item.filename)
                if self._specific_cleanup(full_path) is False:
                    logging.warning("Something went wrong during deep cleaning of %s",
                                    item.filename)
                    abort = True
                    continue
                if any(map(lambda r: r.search(item.filename), self.files_to_keep)):
                    # those files aren't supported, but we want to add them anyway
                    pass
                elif any(map(lambda r: r.search(item.filename), self.files_to_omit)):
                    continue
                else:  # supported files that we want to first clean, then add
                    tmp_parser, mtype = parser_factory.get_parser(full_path)  # type: ignore
                    if not tmp_parser:
                        if self.unknown_member_policy == UnknownMemberPolicy.OMIT:
                            logging.warning("In file %s, omitting unknown element %s (format: %s)",
                                            self.filename, item.filename, mtype)
                            continue
                        elif self.unknown_member_policy == UnknownMemberPolicy.KEEP:
                            logging.warning("In file %s, keeping unknown element %s (format: %s)",
                                            self.filename, item.filename, mtype)
                        else:
                            logging.error("In file %s, element %s's format (%s) " \
                                          "isn't supported",
                                          self.filename, item.filename, mtype)
                            abort = True
                            continue
                    if tmp_parser:
                        if tmp_parser.remove_all() is False:
                            logging.warning("In file %s, something went wrong \
                                             with the cleaning of %s \
                                             (format: %s)",
                                            self.filename, item.filename, mtype)
                            abort = True
                            continue
                        os.rename(tmp_parser.output_filename, full_path)
                zinfo = zipfile.ZipInfo(item.filename)  # type: ignore
                clean_zinfo = self._clean_zipinfo(zinfo)
                with open(full_path, 'rb') as f:
                    zout.writestr(clean_zinfo, f.read())
        shutil.rmtree(temp_folder)
        if abort:
            os.remove(self.output_filename)
            return False
        return True
class ZipParser(ArchiveBasedAbstractParser):
    mimetypes = {'application/zip'}
							
							
							
						
@@ -1,8 +1,12 @@
import mimetypes
import os
import shutil
import tempfile
from typing import Dict, Union
import mutagen
from . import abstract
from . import abstract, parser_factory
class MutagenParser(abstract.AbstractParser):
							
							
							
								
							
						
@@ -13,13 +17,13 @@ class MutagenParser(abstract.AbstractParser):
        except mutagen.MutagenError:
            raise ValueError
    def get_meta(self):
    def get_meta(self) -> Dict[str, Union[str, dict]]:
        f = mutagen.File(self.filename)
        if f.tags:
            return {k:', '.join(v) for k, v in f.tags.items()}
        return {}
    def remove_all(self):
    def remove_all(self) -> bool:
        shutil.copy(self.filename, self.output_filename)
        f = mutagen.File(self.output_filename)
        f.delete()
							
							
							
								
							
						
@@ -30,8 +34,8 @@ class MutagenParser(abstract.AbstractParser):
class MP3Parser(MutagenParser):
    mimetypes = {'audio/mpeg', }
    def get_meta(self):
        metadata = {}
    def get_meta(self) -> Dict[str, Union[str, dict]]:
        metadata = {}  # type: Dict[str, Union[str, dict]]
        meta = mutagen.File(self.filename).tags
        for key in meta:
            metadata[key.rstrip(' \t\r\n\0')] = ', '.join(map(str, meta[key].text))
							
							
							
								
							
						
@@ -44,3 +48,30 @@ class OGGParser(MutagenParser):
class FLACParser(MutagenParser):
    mimetypes = {'audio/flac', 'audio/x-flac'}
    def remove_all(self) -> bool:
        shutil.copy(self.filename, self.output_filename)
        f = mutagen.File(self.output_filename)
        f.clear_pictures()
        f.delete()
        f.save(deleteid3=True)
        return True
    def get_meta(self) -> Dict[str, Union[str, dict]]:
        meta = super().get_meta()
        for num, picture in enumerate(mutagen.File(self.filename).pictures):
            name = picture.desc if picture.desc else 'Cover %d' % num
            extension = mimetypes.guess_extension(picture.mime)
            if extension is None:  #  pragma: no cover
                meta[name] = 'harmful data'
                continue
            _, fname = tempfile.mkstemp()
            fname = fname + extension
            with open(fname, 'wb') as f:
                f.write(picture.data)
            p, _ = parser_factory.get_parser(fname)  # type: ignore
            # Mypy chokes on ternaries :/
            meta[name] = p.get_meta() if p else 'harmful data'  # type: ignore
            os.remove(fname)
        return meta
							
							
							
						
 
							
							
							
						
@@ -0,0 +1,70 @@
import json
import logging
import os
from typing import Dict, Union, Set
from . import abstract
from . import subprocess
# Make pyflakes happy
assert Set
class ExiftoolParser(abstract.AbstractParser):
    """ Exiftool is often the easiest way to get all the metadata
    from a import file, hence why several parsers are re-using its `get_meta`
    method.
    """
    meta_whitelist = set()  # type: Set[str]
    def get_meta(self) -> Dict[str, Union[str, dict]]:
        out = subprocess.run([_get_exiftool_path(), '-json', self.filename],
                             input_filename=self.filename,
                             check=True, stdout=subprocess.PIPE).stdout
        meta = json.loads(out.decode('utf-8'))[0]
        for key in self.meta_whitelist:
            meta.pop(key, None)
        return meta
    def _lightweight_cleanup(self) -> bool:
        if os.path.exists(self.output_filename):
            try:
                # exiftool can't force output to existing files
                os.remove(self.output_filename)
            except OSError as e:  # pragma: no cover
                logging.error("The output file %s is already existing and \
                               can't be overwritten: %s.", self.filename, e)
                return False
        # Note: '-All=' must be followed by a known exiftool option.
        # Also, '-CommonIFD0' is needed for .tiff files
        cmd = [_get_exiftool_path(),
               '-all=',         # remove metadata
               '-adobe=',       # remove adobe-specific metadata
               '-exif:all=',    # remove all exif metadata
               '-Time:All=',    # remove all timestamps
               '-quiet',        # don't show useless logs
               '-CommonIFD0=',  # remove IFD0 metadata
               '-o', self.output_filename,
               self.filename]
        try:
            subprocess.run(cmd, check=True,
                           input_filename=self.filename,
                           output_filename=self.output_filename)
        except subprocess.CalledProcessError as e:  # pragma: no cover
            logging.error("Something went wrong during the processing of %s: %s", self.filename, e)
            return False
        return True
def _get_exiftool_path() -> str:  # pragma: no cover
    possible_pathes = {
        '/usr/bin/exiftool',              # debian/fedora
        '/usr/bin/vendor_perl/exiftool',  # archlinux
    }
    for possible_path in possible_pathes:
        if os.path.isfile(possible_path):
            if os.access(possible_path, os.X_OK):
                return possible_path
    raise RuntimeError("Unable to find exiftool")
							
							
							
						
@@ -1,13 +1,13 @@
import shutil
from typing import Dict
from typing import Dict, Union
from . import abstract
class HarmlessParser(abstract.AbstractParser):
    """ This is the parser for filetypes that do not contain metadata. """
    """ This is the parser for filetypes that can not contain metadata. """
    mimetypes = {'text/plain', 'image/x-ms-bmp'}
    def get_meta(self) -> Dict[str, str]:
    def get_meta(self) -> Dict[str, Union[str, dict]]:
        return dict()
    def remove_all(self) -> bool:
							
								
							
							
							
						
 
							
							
							
						
@@ -0,0 +1,69 @@
from html import parser
from typing import Dict, Any, List, Tuple
from . import abstract
class HTMLParser(abstract.AbstractParser):
    mimetypes = {'text/html', }
    def __init__(self, filename):
        super().__init__(filename)
        self.__parser = _HTMLParser()
        with open(filename) as f:
            self.__parser.feed(f.read())
        self.__parser.close()
    def get_meta(self) -> Dict[str, Any]:
        return self.__parser.get_meta()
    def remove_all(self) -> bool:
        return self.__parser.remove_all(self.output_filename)
class _HTMLParser(parser.HTMLParser):
    """Python doesn't have a validating html parser in its stdlib, so
    we're using an internal queue to track all the opening/closing tags,
    and hoping for the best.
    """
    def __init__(self):
        super().__init__()
        self.__textrepr = ''
        self.__meta = {}
        self.__validation_queue = []
    def handle_starttag(self, tag: str, attrs: List[Tuple[str, str]]):
        self.__textrepr += self.get_starttag_text()
        self.__validation_queue.append(tag)
    def handle_endtag(self, tag: str):
        if not self.__validation_queue:
            raise ValueError
        elif tag != self.__validation_queue.pop():
            raise ValueError
        # There is no `get_endtag_text()` method :/
        self.__textrepr += '</' + tag + '>\n'
    def handle_data(self, data: str):
        if data.strip():
            self.__textrepr += data
    def handle_startendtag(self, tag: str, attrs: List[Tuple[str, str]]):
        if tag == 'meta':
            meta = {k:v for k, v in attrs}
            name = meta.get('name', 'harmful metadata')
            content = meta.get('content', 'harmful data')
            self.__meta[name] = content
        else:
            self.__textrepr += self.get_starttag_text()
    def remove_all(self, output_filename: str) -> bool:
        if self.__validation_queue:
            raise ValueError
        with open(output_filename, 'w') as f:
            f.write(self.__textrepr)
        return True
    def get_meta(self) -> Dict[str, Any]:
        if self.__validation_queue:
            raise ValueError
        return self.__meta
							
							
							
						
@@ -1,48 +1,19 @@
import subprocess
import imghdr
import json
import os
import shutil
import tempfile
import re
from typing import Set
import cairo
import gi
gi.require_version('GdkPixbuf', '2.0')
from gi.repository import GdkPixbuf
from gi.repository import GdkPixbuf, GLib
from . import abstract
from . import exiftool
# Make pyflakes happy
assert Set
class _ImageParser(abstract.AbstractParser):
    @staticmethod
    def __handle_problematic_filename(filename: str, callback) -> str:
        """ This method takes a filename with a problematic name,
        and safely applies it a `callback`."""
        tmpdirname = tempfile.mkdtemp()
        fname = os.path.join(tmpdirname, "temp_file")
        shutil.copy(filename, fname)
        out = callback(fname)
        shutil.rmtree(tmpdirname)
        return out
    def get_meta(self):
        """ There is no way to escape the leading(s) dash(es) of the current
        self.filename to prevent parameter injections, so we need to take care
        of this.
        """
        fun = lambda f: subprocess.check_output(['/usr/bin/exiftool', '-json', f])
        if re.search('^[a-z0-9/]', self.filename) is None:
            out = self.__handle_problematic_filename(self.filename, fun)
        else:
            out = fun(self.filename)
        meta = json.loads(out.decode('utf-8'))[0]
        for key in self.meta_whitelist:
            meta.pop(key, None)
        return meta
class PNGParser(_ImageParser):
class PNGParser(exiftool.ExiftoolParser):
    mimetypes = {'image/png', }
    meta_whitelist = {'SourceFile', 'ExifToolVersion', 'FileName',
                      'Directory', 'FileSize', 'FileModifyDate',
							
							
							
								
							
						
@@ -54,36 +25,63 @@ class PNGParser(_ImageParser):
    def __init__(self, filename):
        super().__init__(filename)
        try:  # better fail here than later
            cairo.ImageSurface.create_from_png(self.filename)
        except MemoryError:
        if imghdr.what(filename) != 'png':
            raise ValueError
    def remove_all(self):
        try:  # better fail here than later
            cairo.ImageSurface.create_from_png(self.filename)
        except MemoryError:  # pragma: no cover
            raise ValueError
    def remove_all(self) -> bool:
        if self.lightweight_cleaning:
            return self._lightweight_cleanup()
        surface = cairo.ImageSurface.create_from_png(self.filename)
        surface.write_to_png(self.output_filename)
        return True
class GdkPixbufAbstractParser(_ImageParser):
class GIFParser(exiftool.ExiftoolParser):
    mimetypes = {'image/gif'}
    meta_whitelist = {'AnimationIterations', 'BackgroundColor', 'BitsPerPixel',
                      'ColorResolutionDepth', 'Directory', 'Duration',
                      'ExifToolVersion', 'FileAccessDate',
                      'FileInodeChangeDate', 'FileModifyDate', 'FileName',
                      'FilePermissions', 'FileSize', 'FileType',
                      'FileTypeExtension', 'FrameCount', 'GIFVersion',
                      'HasColorMap', 'ImageHeight', 'ImageSize', 'ImageWidth',
                      'MIMEType', 'Megapixels', 'SourceFile',}
    def remove_all(self) -> bool:
        return self._lightweight_cleanup()
class GdkPixbufAbstractParser(exiftool.ExiftoolParser):
    """ GdkPixbuf can handle a lot of surfaces, so we're rending images on it,
        this has the side-effect of removing metadata completely.
        this has the side-effect of completely removing metadata.
    """
    _type = ''
    def remove_all(self):
        _, extension = os.path.splitext(self.filename)
        pixbuf = GdkPixbuf.Pixbuf.new_from_file(self.filename)
        if extension == '.jpg':
            extension = '.jpeg'  # gdk is picky
        pixbuf.savev(self.output_filename, extension[1:], [], [])
        return True
    def __init__(self, filename):
        super().__init__(filename)
        if imghdr.what(filename) != self._type:  # better safe than sorry
        # we can't use imghdr here because of https://bugs.python.org/issue28591
        try:
            GdkPixbuf.Pixbuf.new_from_file(self.filename)
        except GLib.GError:
            raise ValueError
    def remove_all(self) -> bool:
        if self.lightweight_cleaning:
            return self._lightweight_cleanup()
        _, extension = os.path.splitext(self.filename)
        pixbuf = GdkPixbuf.Pixbuf.new_from_file(self.filename)
        if extension.lower() == '.jpg':
            extension = '.jpeg'  # gdk is picky
        pixbuf.savev(self.output_filename, type=extension[1:], option_keys=[], option_values=[])
        return True
class JPGParser(GdkPixbufAbstractParser):
    _type = 'jpeg'
							
								
							
							
							
						
 
							
							
							
						
@@ -1,124 +1,46 @@
import logging
import os
import re
import shutil
import tempfile
import datetime
import zipfile
import xml.etree.ElementTree as ET
from typing import Dict, Set, Pattern
from typing import Dict, Set, Pattern, Tuple, Any
import xml.etree.ElementTree as ET  # type: ignore
from . import abstract, parser_factory
from .archive import ArchiveBasedAbstractParser
# pylint: disable=line-too-long
# Make pyflakes happy
assert Set
assert Pattern
def _parse_xml(full_path: str):
    """ This function parse XML with namespace support. """
    def parse_map(f):  # etree support for ns is a bit rough
        ns_map = dict()
        for event, (k, v) in ET.iterparse(f, ("start-ns", )):
            if event == "start-ns":
                ns_map[k] = v
        return ns_map
def _parse_xml(full_path: str) -> Tuple[ET.ElementTree, Dict[str, str]]:
    """ This function parses XML, with namespace support. """
    namespace_map = dict()
    for _, (key, value) in ET.iterparse(full_path, ("start-ns", )):
        # The ns[0-9]+ namespaces are reserved for internal usage, so
        # we have to use an other nomenclature.
        if re.match('^ns[0-9]+$', key, re.I):  # pragma: no cover
            key = 'mat' + key[2:]
    ns = parse_map(full_path)
        namespace_map[key] = value
        ET.register_namespace(key, value)
    # Register the namespaces
    for k, v in ns.items():
        ET.register_namespace(k, v)
    return ET.parse(full_path), ns
    return ET.parse(full_path), namespace_map
class ArchiveBasedAbstractParser(abstract.AbstractParser):
    # Those are the files that have a format that _isn't_
    # supported by MAT2, but that we want to keep anyway.
    files_to_keep = set()  # type: Set[str]
def _sort_xml_attributes(full_path: str) -> bool:
    """ Sort xml attributes lexicographically,
    because it's possible to fingerprint producers (MS Office, Libreoffice, …)
    since they are all using different orders.
    """
    tree = ET.parse(full_path)
    # Those are the files that we _do not_ want to keep,
    # no matter if they are supported or not.
    files_to_omit = set() # type: Set[Pattern]
    for c in tree.getroot():
        c[:] = sorted(c, key=lambda child: (child.tag, child.get('desc')))
    def __init__(self, filename):
        super().__init__(filename)
        try:  # better fail here than later
            zipfile.ZipFile(self.filename)
        except zipfile.BadZipFile:
            raise ValueError
    def _specific_cleanup(self, full_path: str) -> bool:
        """ This method can be used to apply specific treatment
        to files present in the archive."""
        return True
    def _clean_zipinfo(self, zipinfo: zipfile.ZipInfo) -> zipfile.ZipInfo:
        zipinfo.create_system = 3  # Linux
        zipinfo.comment = b''
        zipinfo.date_time = (1980, 1, 1, 0, 0, 0)
        return zipinfo
    def _get_zipinfo_meta(self, zipinfo: zipfile.ZipInfo) -> Dict[str, str]:
        metadata = {}
        if zipinfo.create_system == 3:
            #metadata['create_system'] = 'Linux'
            pass
        elif zipinfo.create_system == 2:
            metadata['create_system'] = 'Windows'
        else:
            metadata['create_system'] = 'Weird'
        if zipinfo.comment:
            metadata['comment'] = zipinfo.comment  # type: ignore
        if zipinfo.date_time != (1980, 1, 1, 0, 0, 0):
            metadata['date_time'] = str(datetime.datetime(*zipinfo.date_time))
        return metadata
    def remove_all(self) -> bool:
        with zipfile.ZipFile(self.filename) as zin,\
             zipfile.ZipFile(self.output_filename, 'w') as zout:
            temp_folder = tempfile.mkdtemp()
            for item in zin.infolist():
                if item.filename[-1] == '/':  # `is_dir` is added in Python3.6
                    continue  # don't keep empty folders
                zin.extract(member=item, path=temp_folder)
                full_path = os.path.join(temp_folder, item.filename)
                if self._specific_cleanup(full_path) is False:
                    shutil.rmtree(temp_folder)
                    os.remove(self.output_filename)
                    print("Something went wrong during deep cleaning of %s" % item.filename)
                    return False
                if item.filename in self.files_to_keep:
                    # those files aren't supported, but we want to add them anyway
                    pass
                elif any(map(lambda r: r.search(item.filename), self.files_to_omit)):
                    continue
                else:
                    # supported files that we want to clean then add
                    tmp_parser, mtype = parser_factory.get_parser(full_path)  # type: ignore
                    if not tmp_parser:
                        shutil.rmtree(temp_folder)
                        os.remove(self.output_filename)
                        print("%s's format (%s) isn't supported" % (item.filename, mtype))
                        return False
                    tmp_parser.remove_all()
                    os.rename(tmp_parser.output_filename, full_path)
                zinfo = zipfile.ZipInfo(item.filename)  # type: ignore
                clean_zinfo = self._clean_zipinfo(zinfo)
                with open(full_path, 'rb') as f:
                    zout.writestr(clean_zinfo, f.read())
        shutil.rmtree(temp_folder)
        return True
    tree.write(full_path, xml_declaration=True)
    return True
class MSOfficeParser(ArchiveBasedAbstractParser):
							
							
							
								
							
						
@@ -127,80 +49,267 @@ class MSOfficeParser(ArchiveBasedAbstractParser):
        'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
        'application/vnd.openxmlformats-officedocument.presentationml.presentation'
    }
    files_to_keep = {
        '[Content_Types].xml',
        '_rels/.rels',
        'word/_rels/document.xml.rels',
        'word/document.xml',
        'word/fontTable.xml',
        'word/settings.xml',
        'word/styles.xml',
    content_types_to_keep = {
        'application/vnd.openxmlformats-officedocument.wordprocessingml.endnotes+xml',  # /word/endnotes.xml
        'application/vnd.openxmlformats-officedocument.wordprocessingml.footnotes+xml',  # /word/footnotes.xml
        'application/vnd.openxmlformats-officedocument.extended-properties+xml',  # /docProps/app.xml
        'application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml',  # /word/document.xml
        'application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml',  # /word/fontTable.xml
        'application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml',  # /word/footer.xml
        'application/vnd.openxmlformats-officedocument.wordprocessingml.header+xml',  # /word/header.xml
        'application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml',  # /word/styles.xml
        'application/vnd.openxmlformats-package.core-properties+xml',  # /docProps/core.xml
        # Do we want to keep the following ones?
        'application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml',
        # See https://0xacab.org/jvoisin/mat2/issues/71
        'application/vnd.openxmlformats-officedocument.wordprocessingml.numbering+xml',  # /word/numbering.xml
    }
    files_to_omit = set(map(re.compile, {  # type: ignore
        '^docProps/',
    }))
    def __remove_revisions(self, full_path: str) -> bool:
        """ In this function, we're changing the XML
        document in two times, since we don't want
        to change the tree we're iterating on."""
        tree, ns = _parse_xml(full_path)
        # No revisions are present
        if tree.find('.//w:del', ns) is None:
            return True
        elif tree.find('.//w:ins', ns) is None:
    def __init__(self, filename):
        super().__init__(filename)
        self.files_to_keep = set(map(re.compile, {  # type: ignore
            r'^\[Content_Types\]\.xml$',
            r'^_rels/\.rels$',
            r'^word/_rels/document\.xml\.rels$',
            r'^word/_rels/footer[0-9]*\.xml\.rels$',
            r'^word/_rels/header[0-9]*\.xml\.rels$',
            # https://msdn.microsoft.com/en-us/library/dd908153(v=office.12).aspx
            r'^word/stylesWithEffects\.xml$',
        }))
        self.files_to_omit = set(map(re.compile, {  # type: ignore
            r'^customXml/',
            r'webSettings\.xml$',
            r'^docProps/custom\.xml$',
            r'^word/printerSettings/',
            r'^word/theme',
            r'^word/people\.xml$',
            # we have a whitelist in self.files_to_keep,
            # so we can trash everything else
            r'^word/_rels/',
        }))
        if self.__fill_files_to_keep_via_content_types() is False:
            raise ValueError
    def __fill_files_to_keep_via_content_types(self) -> bool:
        """ There is a suer-handy `[Content_Types].xml` file
        in MS Office archives, describing what each other file contains.
        The self.content_types_to_keep member contains a type whitelist,
        so we're using it to fill the self.files_to_keep one.
        """
        with zipfile.ZipFile(self.filename) as zin:
            if '[Content_Types].xml' not in zin.namelist():
                return False
            xml_data = zin.read('[Content_Types].xml')
        self.content_types = dict()  # type: Dict[str, str]
        try:
            tree = ET.fromstring(xml_data)
        except ET.ParseError:
            return False
        for c in tree:
            if 'PartName' not in c.attrib or 'ContentType' not in c.attrib:
                continue
            elif c.attrib['ContentType'] in self.content_types_to_keep:
                fname = c.attrib['PartName'][1:]  # remove leading `/`
                re_fname = re.compile('^' + re.escape(fname) + '$')
                self.files_to_keep.add(re_fname)  # type: ignore
        return True
    @staticmethod
    def __remove_rsid(full_path: str) -> bool:
        """ The method will remove "revision session ID".  We're '}rsid'
        instead of proper parsing, since rsid can have multiple forms, like
        `rsidRDefault`, `rsidR`, `rsids`, …
        We're removing rsid tags in two times, because we can't modify
        the xml while we're iterating on it.
        For more details, see
        - https://msdn.microsoft.com/en-us/library/office/documentformat.openxml.wordprocessing.previoussectionproperties.rsidrpr.aspx
        - https://blogs.msdn.microsoft.com/brian_jones/2006/12/11/whats-up-with-all-those-rsids/
        """
        try:
            tree, namespace = _parse_xml(full_path)
        except ET.ParseError:
            return False
        # rsid, tags or attributes, are always under the `w` namespace
        if 'w' not in namespace.keys():
            return True
        parent_map = {c:p for p in tree.iter() for c in p}
        elements = list([element for element in tree.iterfind('.//w:del', ns)])
        for element in elements:
        elements_to_remove = list()
        for item in tree.iterfind('.//', namespace):
            if '}rsid' in item.tag.strip().lower():  # rsid as tag
                elements_to_remove.append(item)
                continue
            for key in list(item.attrib.keys()):  # rsid as attribute
                if '}rsid' in key.lower():
                    del item.attrib[key]
        for element in elements_to_remove:
            parent_map[element].remove(element)
        elements = list()
        for element in tree.iterfind('.//w:ins', ns):
            for position, item in enumerate(tree.iter()):
        tree.write(full_path, xml_declaration=True)
        return True
    @staticmethod
    def __remove_revisions(full_path: str) -> bool:
        """ In this function, we're changing the XML document in several
        different times, since we don't want to change the tree we're currently
        iterating on.
        """
        try:
            tree, namespace = _parse_xml(full_path)
        except ET.ParseError as e:
            logging.error("Unable to parse %s: %s", full_path, e)
            return False
        # Revisions are either deletions (`w:del`) or
        # insertions (`w:ins`)
        del_presence = tree.find('.//w:del', namespace)
        ins_presence = tree.find('.//w:ins', namespace)
        if del_presence is None and ins_presence is None:
            return True  # No revisions are present
        parent_map = {c:p for p in tree.iter() for c in p}
        elements_del = list()
        for element in tree.iterfind('.//w:del', namespace):
            elements_del.append(element)
        for element in elements_del:
            parent_map[element].remove(element)
        elements_ins = list()
        for element in tree.iterfind('.//w:ins', namespace):
            for position, item in enumerate(tree.iter()):  # pragma: no cover
                if item == element:
                    for children in element.iterfind('./*'):
                        elements.append((element, position, children))
                        elements_ins.append((element, position, children))
                    break
        for (element, position, children) in elements:
        for (element, position, children) in elements_ins:
            parent_map[element].insert(position, children)
            parent_map[element].remove(element)
        tree.write(full_path, xml_declaration=True)
        return True
    def __remove_content_type_members(self, full_path: str) -> bool:
        """ The method will remove the dangling references
        form the [Content_Types].xml file, since MS office doesn't like them
        """
        try:
            tree, namespace = _parse_xml(full_path)
        except ET.ParseError:  # pragma: no cover
            return False
        if len(namespace.items()) != 1:
            return False  # there should be only one namespace for Types
        removed_fnames = set()
        with zipfile.ZipFile(self.filename) as zin:
            for fname in [item.filename for item in zin.infolist()]:
                for file_to_omit in self.files_to_omit:
                    if file_to_omit.search(fname):
                        matches = map(lambda r: r.search(fname), self.files_to_keep)
                        if any(matches):  # the file is whitelisted
                            continue
                        removed_fnames.add(fname)
                        break
        root = tree.getroot()
        for item in root.findall('{%s}Override' % namespace['']):
            name = item.attrib['PartName'][1:]  # remove the leading '/'
            if name in removed_fnames:
                root.remove(item)
        tree.write(full_path, xml_declaration=True)
        return True
    def _specific_cleanup(self, full_path: str) -> bool:
        if full_path.endswith('/word/document.xml'):
            return self.__remove_revisions(full_path)
        # pylint: disable=too-many-return-statements
        if os.stat(full_path).st_size == 0:  # Don't process empty files
            return True
        if not full_path.endswith('.xml'):
            return True
        if full_path.endswith('/[Content_Types].xml'):
            # this file contains references to files that we might
            # remove, and MS Office doesn't like dangling references
            if self.__remove_content_type_members(full_path) is False:
                return False
        elif full_path.endswith('/word/document.xml'):
            # this file contains the revisions
            if self.__remove_revisions(full_path) is False:
                return False
        elif full_path.endswith('/docProps/app.xml'):
            # This file must be present and valid,
            # so we're removing as much as we can.
            with open(full_path, 'wb') as f:
                f.write(b'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>')
                f.write(b'<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties">')
                f.write(b'</Properties>')
        elif full_path.endswith('/docProps/core.xml'):
            # This file must be present and valid,
            # so we're removing as much as we can.
            with open(full_path, 'wb') as f:
                f.write(b'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>')
                f.write(b'<cp:coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties">')
                f.write(b'</cp:coreProperties>')
        if self.__remove_rsid(full_path) is False:
            return False
        try:
            _sort_xml_attributes(full_path)
        except ET.ParseError as e:  # pragma: no cover
            logging.error("Unable to parse %s: %s", full_path, e)
            return False
        # This is awful, I'm sorry.
        #
        # Microsoft Office isn't happy when we have the `mc:Ignorable`
        # tag containing namespaces that aren't present in the xml file,
        # so instead of trying to remove this specific tag with etree,
        # we're removing it, with a regexp.
        #
        # Since we're the ones producing this file, via the call to
        # _sort_xml_attributes, there won't be any "funny tricks".
        # Worst case, the tag isn't present, and everything is fine.
        #
        # see: https://docs.microsoft.com/en-us/dotnet/framework/wpf/advanced/mc-ignorable-attribute
        with open(full_path, 'rb') as f:
            text = f.read()
            out = re.sub(b'mc:Ignorable="[^"]*"', b'', text, 1)
        with open(full_path, 'wb') as f:
            f.write(out)
        return True
    def get_meta(self) -> Dict[str, str]:
    def _specific_get_meta(self, full_path: str, file_path: str) -> Dict[str, Any]:
        """
        Yes, I know that parsing xml with regexp ain't pretty,
        be my guest and fix it if you want.
        """
        metadata = {}
        zipin = zipfile.ZipFile(self.filename)
        for item in zipin.infolist():
            if item.filename.startswith('docProps/') and item.filename.endswith('.xml'):
                content = zipin.read(item).decode('utf-8')
                try:
                    results = re.findall(r"<(.+)>(.+)</\1>", content, re.I|re.M)
                    for (key, value) in results:
                        metadata[key] = value
                except TypeError:  # We didn't manage to parse the xml file
                    pass
                if not metadata:  # better safe than sorry
                    metadata[item] = 'harmful content'
            for key, value in self._get_zipinfo_meta(item).items():
                metadata[key] = value
        zipin.close()
        return metadata
        if not file_path.startswith('docProps/') or not file_path.endswith('.xml'):
            return {}
        with open(full_path, encoding='utf-8') as f:
            try:
                results = re.findall(r"<(.+)>(.+)</\1>", f.read(), re.I|re.M)
                return {k:v for (k, v) in results}
            except (TypeError, UnicodeDecodeError):
                # We didn't manage to parse the xml file
                return {file_path: 'harmful content', }
class LibreOfficeParser(ArchiveBasedAbstractParser):
							
							
							
								
							
						
@@ -213,59 +322,70 @@ class LibreOfficeParser(ArchiveBasedAbstractParser):
        'application/vnd.oasis.opendocument.formula',
        'application/vnd.oasis.opendocument.image',
    }
    files_to_keep = {
        'META-INF/manifest.xml',
        'content.xml',
        'manifest.rdf',
        'mimetype',
        'settings.xml',
        'styles.xml',
    }
    files_to_omit = set(map(re.compile, {  # type: ignore
        r'^meta\.xml$',
        '^Configurations2/',
        '^Thumbnails/',
    }))
    def __remove_revisions(self, full_path: str) -> bool:
        tree, ns = _parse_xml(full_path)
    def __init__(self, filename):
        super().__init__(filename)
        if 'office' not in ns.keys():  # no revisions in the current file
        self.files_to_keep = set(map(re.compile, {  # type: ignore
            r'^META-INF/manifest\.xml$',
            r'^content\.xml$',
            r'^manifest\.rdf$',
            r'^mimetype$',
            r'^settings\.xml$',
            r'^styles\.xml$',
        }))
        self.files_to_omit = set(map(re.compile, {  # type: ignore
            r'^meta\.xml$',
            r'^Configurations2/',
            r'^Thumbnails/',
        }))
    @staticmethod
    def __remove_revisions(full_path: str) -> bool:
        try:
            tree, namespace = _parse_xml(full_path)
        except ET.ParseError as e:
            logging.error("Unable to parse %s: %s", full_path, e)
            return False
        if 'office' not in namespace.keys():  # no revisions in the current file
            return True
        for text in tree.getroot().iterfind('.//office:text', ns):
            for changes in text.iterfind('.//text:tracked-changes', ns):
        for text in tree.getroot().iterfind('.//office:text', namespace):
            for changes in text.iterfind('.//text:tracked-changes', namespace):
                text.remove(changes)
        tree.write(full_path, xml_declaration=True)
        return True
    def _specific_cleanup(self, full_path: str) -> bool:
        if os.path.basename(full_path) == 'content.xml':
            return self.__remove_revisions(full_path)
        if os.stat(full_path).st_size == 0:  # Don't process empty files
            return True
        if os.path.basename(full_path).endswith('.xml'):
            if os.path.basename(full_path) == 'content.xml':
                if self.__remove_revisions(full_path) is False:
                    return False
            try:
                _sort_xml_attributes(full_path)
            except ET.ParseError as e:
                logging.error("Unable to parse %s: %s", full_path, e)
                return False
        return True
    def get_meta(self) -> Dict[str, str]:
    def _specific_get_meta(self, full_path: str, file_path: str) -> Dict[str, Any]:
        """
        Yes, I know that parsing xml with regexp ain't pretty,
        be my guest and fix it if you want.
        """
        metadata = {}
        zipin = zipfile.ZipFile(self.filename)
        for item in zipin.infolist():
            if item.filename == 'meta.xml':
                content = zipin.read(item).decode('utf-8')
                try:
                    results = re.findall(r"<((?:meta|dc|cp).+?)>(.+)</\1>", content, re.I|re.M)
                    for (key, value) in results:
                        metadata[key] = value
                except TypeError:  # We didn't manage to parse the xml file
                    pass
                if not metadata:  # better safe than sorry
                    metadata[item] = 'harmful content'
            for key, value in self._get_zipinfo_meta(item).items():
                metadata[key] = value
        zipin.close()
        return metadata
        if file_path != 'meta.xml':
            return {}
        with open(full_path, encoding='utf-8') as f:
            try:
                results = re.findall(r"<((?:meta|dc|cp).+?)[^>]*>(.+)</\1>", f.read(), re.I|re.M)
                return {k:v for (k, v) in results}
            except (TypeError, UnicodeDecodeError):  # We didn't manage to parse the xml file
                # We didn't manage to parse the xml file
                return {file_path: 'harmful content', }
							
							
							
						
 
							
							
								
							
							
						
@@ -4,24 +4,31 @@ import mimetypes
import importlib
from typing import TypeVar, List, Tuple, Optional
from . import abstract, unsupported_extensions
from . import abstract, UNSUPPORTED_EXTENSIONS
assert Tuple  # make pyflakes happy
T = TypeVar('T', bound='abstract.AbstractParser')
def __load_all_parsers():
    """ Loads every parser in a dynamic way """
    current_dir = os.path.dirname(__file__)
    for name in glob.glob(os.path.join(current_dir, '*.py')):
        if name.endswith('abstract.py') or name.endswith('__init__.py'):
    for fname in glob.glob(os.path.join(current_dir, '*.py')):
        if fname.endswith('abstract.py'):
            continue
        basename = os.path.basename(name)
        elif fname.endswith('__init__.py'):
            continue
        elif fname.endswith('exiftool.py'):
            continue
        basename = os.path.basename(fname)
        name, _ = os.path.splitext(basename)
        importlib.import_module('.' + name, package='libmat2')
__load_all_parsers()
def _get_parsers() -> List[T]:
    """ Get all our parsers!"""
    def __get_parsers(cls):
							
							
							
								
							
						
@@ -31,10 +38,11 @@ def _get_parsers() -> List[T]:
def get_parser(filename: str) -> Tuple[Optional[T], Optional[str]]:
    """ Return the appropriate parser for a giver filename. """
    mtype, _ = mimetypes.guess_type(filename)
    _, extension = os.path.splitext(filename)
    if extension in unsupported_extensions:
    if extension.lower() in UNSUPPORTED_EXTENSIONS:
        return None, mtype
    for parser_class in _get_parsers():  # type: ignore
							
								
							
							
							
						
 
							
							
								
							
							
						
@@ -7,6 +7,7 @@ import re
import logging
import tempfile
import io
from typing import Dict, Union
from distutils.version import LooseVersion
import cairo
							
							
							
								
							
						
@@ -16,12 +17,10 @@ from gi.repository import Poppler, GLib
from . import abstract
logging.basicConfig(level=logging.DEBUG)
poppler_version = Poppler.get_version()
if LooseVersion(poppler_version) < LooseVersion('0.46'):
if LooseVersion(poppler_version) < LooseVersion('0.46'):  # pragma: no cover
    raise ValueError("MAT2 needs at least Poppler version 0.46 to work. \
The installed version is %s." % poppler_version)
The installed version is %s." % poppler_version)  # pragma: no cover
class PDFParser(abstract.AbstractParser):
							
							
							
								
							
						
@@ -39,7 +38,12 @@ class PDFParser(abstract.AbstractParser):
        except GLib.GError:  # Invalid PDF
            raise ValueError
    def remove_all_lightweight(self):
    def remove_all(self) -> bool:
        if self.lightweight_cleaning is True:
            return self.__remove_all_lightweight()
        return self.__remove_all_thorough()
    def __remove_all_lightweight(self) -> bool:
        """
            Load the document into Poppler, render pages on a new PDFSurface.
        """
							
							
							
								
							
						
@@ -47,7 +51,7 @@ class PDFParser(abstract.AbstractParser):
        pages_count = document.get_n_pages()
        tmp_path = tempfile.mkstemp()[1]
        pdf_surface = cairo.PDFSurface(tmp_path, 10, 10)
        pdf_surface = cairo.PDFSurface(tmp_path, 10, 10)  # resized later anyway
        pdf_context = cairo.Context(pdf_surface)  # context draws on the surface
        for pagenum in range(pages_count):
							
							
							
								
							
						
@@ -66,7 +70,7 @@ class PDFParser(abstract.AbstractParser):
        return True
    def remove_all(self):
    def __remove_all_thorough(self) -> bool:
        """
            Load the document into Poppler, render pages on PNG,
            and shove those PNG into a new PDF.
							
								
							
							
								
							
							
						
@@ -101,7 +105,7 @@ class PDFParser(abstract.AbstractParser):
            pdf_surface.set_size(page_width*self.__scale, page_height*self.__scale)
            pdf_context.set_source_surface(img, 0, 0)
            pdf_context.paint()
            pdf_context.show_page()
            pdf_context.show_page()  # draw pdf_context on pdf_surface
        pdf_surface.finish()
							
							
							
								
							
						
@@ -120,15 +124,14 @@ class PDFParser(abstract.AbstractParser):
        document.save('file://' + os.path.abspath(out_file))
        return True
    @staticmethod
    def __parse_metadata_field(data: str) -> dict:
    def __parse_metadata_field(data: str) -> Dict[str, str]:
        metadata = {}
        for (_, key, value) in re.findall(r"<(xmp|pdfx|pdf|xmpMM):(.+)>(.+)</\1:\2>", data, re.I):
            metadata[key] = value
        return metadata
    def get_meta(self):
    def get_meta(self) -> Dict[str, Union[str, dict]]:
        """ Return a dict with all the meta of the file
        """
        metadata = {}
							
								
							
							
							
						
 
							
							
							
						
@@ -0,0 +1,105 @@
"""
Wrapper around a subset of the subprocess module,
that uses bwrap (bubblewrap) when it is available.
Instead of importing subprocess, other modules should use this as follows:
  from . import subprocess
"""
import os
import shutil
import subprocess
import tempfile
from typing import List, Optional
__all__ = ['PIPE', 'run', 'CalledProcessError']
PIPE = subprocess.PIPE
CalledProcessError = subprocess.CalledProcessError
def _get_bwrap_path() -> str:
    bwrap_path = '/usr/bin/bwrap'
    if os.path.isfile(bwrap_path):
        if os.access(bwrap_path, os.X_OK):
            return bwrap_path
    raise RuntimeError("Unable to find bwrap")  # pragma: no cover
# pylint: disable=bad-whitespace
def _get_bwrap_args(tempdir: str,
                    input_filename: str,
                    output_filename: Optional[str] = None) -> List[str]:
    ro_bind_args = []
    cwd = os.getcwd()
    # XXX: use --ro-bind-try once all supported platforms
    # have a bubblewrap recent enough to support it.
    ro_bind_dirs = ['/usr', '/lib', '/lib64', '/bin', '/sbin', cwd]
    for bind_dir in ro_bind_dirs:
        if os.path.isdir(bind_dir):  # pragma: no cover
            ro_bind_args.extend(['--ro-bind', bind_dir, bind_dir])
    ro_bind_files = ['/etc/ld.so.cache']
    for bind_file in ro_bind_files:
        if os.path.isfile(bind_file):  # pragma: no cover
            ro_bind_args.extend(['--ro-bind', bind_file, bind_file])
    args = ro_bind_args + \
        ['--dev', '/dev',
         '--chdir', cwd,
         '--unshare-all',
         '--new-session',
         # XXX: enable --die-with-parent once all supported platforms have
         # a bubblewrap recent enough to support it.
         # '--die-with-parent',
        ]
    if output_filename:
        # Mount an empty temporary directory where the sandboxed
        # process will create its output file
        output_dirname = os.path.dirname(os.path.abspath(output_filename))
        args.extend(['--bind', tempdir, output_dirname])
    absolute_input_filename = os.path.abspath(input_filename)
    args.extend(['--ro-bind', absolute_input_filename, absolute_input_filename])
    return args
# pylint: disable=bad-whitespace
def run(args: List[str],
        input_filename: str,
        output_filename: Optional[str] = None,
        **kwargs) -> subprocess.CompletedProcess:
    """Wrapper around `subprocess.run`, that uses bwrap (bubblewrap) if it
    is available.
    Extra supported keyword arguments:
     - `input_filename`, made available read-only in the sandbox
     - `output_filename`, where the file created by the sandboxed process
       is copied upon successful completion; an empty temporary directory
       is made visible as the parent directory of this file in the sandbox.
       Optional: one valid use case is to invoke an external process
       to inspect metadata present in a file.
    """
    try:
        bwrap_path = _get_bwrap_path()
    except RuntimeError:  # pragma: no cover
        # bubblewrap is not installed ⇒ short-circuit
        return subprocess.run(args, **kwargs)
    with tempfile.TemporaryDirectory() as tempdir:
        prefix_args = [bwrap_path] + \
            _get_bwrap_args(input_filename=input_filename,
                            output_filename=output_filename,
                            tempdir=tempdir)
        completed_process = subprocess.run(prefix_args + args, **kwargs)
        if output_filename and completed_process.returncode == 0:
            shutil.copy(os.path.join(tempdir, os.path.basename(output_filename)),
                        output_filename)
        return completed_process
							
							
								
							
							
						
@@ -8,33 +8,32 @@ class TorrentParser(abstract.AbstractParser):
    mimetypes = {'application/x-bittorrent', }
    whitelist = {b'announce', b'announce-list', b'info'}
    def get_meta(self) -> Dict[str, str]:
        metadata = {}
    def __init__(self, filename):
        super().__init__(filename)
        with open(self.filename, 'rb') as f:
            d = _BencodeHandler().bdecode(f.read())
        if d is None:
            return {'Unknown meta': 'Unable to parse torrent file "%s".' % self.filename}
        for k, v in d.items():
            if k not in self.whitelist:
                metadata[k.decode('utf-8')] = v
        return metadata
            self.dict_repr = _BencodeHandler().bdecode(f.read())
        if self.dict_repr is None:
            raise ValueError
    def get_meta(self) -> Dict[str, Union[str, dict]]:
        metadata = {}
        for key, value in self.dict_repr.items():
            if key not in self.whitelist:
                metadata[key.decode('utf-8')] = value
        return metadata
    def remove_all(self) -> bool:
        cleaned = dict()
        with open(self.filename, 'rb') as f:
            d = _BencodeHandler().bdecode(f.read())
        if d is None:
            return False
        for k, v in d.items():
            if k in self.whitelist:
                cleaned[k] = v
        for key, value in self.dict_repr.items():
            if key in self.whitelist:
                cleaned[key] = value
        with open(self.output_filename, 'wb') as f:
            f.write(_BencodeHandler().bencode(cleaned))
        self.dict_repr = cleaned  # since we're stateful
        return True
class _BencodeHandler(object):
class _BencodeHandler():
    """
    Since bencode isn't that hard to parse,
    MAT2 comes with its own parser, based on the spec
							
							
							
								
							
						
@@ -60,8 +59,6 @@ class _BencodeHandler(object):
    def __decode_int(s: bytes) -> Tuple[int, bytes]:
        s = s[1:]
        next_idx = s.index(b'e')
        if next_idx is None:
            raise ValueError  # missing suffix
        if s.startswith(b'-0'):
            raise ValueError  # negative zero doesn't exist
        elif s.startswith(b'0') and next_idx != 1:
							
							
							
								
							
						
@@ -70,32 +67,30 @@ class _BencodeHandler(object):
    @staticmethod
    def __decode_string(s: bytes) -> Tuple[bytes, bytes]:
        sep = s.index(b':')
        if set is None:
            raise ValueError  # missing suffix
        str_len = int(s[:sep])
        if str_len < 0:
            raise ValueError
        elif s[0] == b'0' and sep != 1:
        colon = s.index(b':')
        # FIXME Python3 is broken here, the call to `ord` shouldn't be needed,
        # but apparently it is. This is utterly idiotic.
        if (s[0] == ord('0') or s[0] == '0') and colon != 1:
            raise ValueError
        str_len = int(s[:colon])
        s = s[1:]
        return s[sep:sep+str_len], s[sep+str_len:]
        return s[colon:colon+str_len], s[colon+str_len:]
    def __decode_list(self, s: bytes) -> Tuple[list, bytes]:
        r = list()
        ret = list()
        s = s[1:]  # skip leading `l`
        while s[0] != ord('e'):
            v, s = self.__decode_func[s[0]](s)
            r.append(v)
        return r, s[1:]
            value, s = self.__decode_func[s[0]](s)
            ret.append(value)
        return ret, s[1:]
    def __decode_dict(self, s: bytes) -> Tuple[dict, bytes]:
        r = dict()
        ret = dict()
        s = s[1:]  # skip leading `d`
        while s[0] != ord(b'e'):
            k, s = self.__decode_string(s)
            r[k], s = self.__decode_func[s[0]](s)
        return r, s[1:]
            key, s = self.__decode_string(s)
            ret[key], s = self.__decode_func[s[0]](s)
        return ret, s[1:]
    @staticmethod
    def __encode_int(x: bytes) -> bytes:
							
							
							
								
							
						
@@ -113,9 +108,9 @@ class _BencodeHandler(object):
    def __encode_dict(self, x: dict) -> bytes:
        ret = b''
        for k, v in sorted(x.items()):
            ret += self.__encode_func[type(k)](k)
            ret += self.__encode_func[type(v)](v)
        for key, value in sorted(x.items()):
            ret += self.__encode_func[type(key)](key)
            ret += self.__encode_func[type(value)](value)
        return b'd' + ret + b'e'
    def bencode(self, s: Union[dict, list, bytes, int]) -> bytes:
							
							
							
								
							
						
@@ -123,11 +118,11 @@ class _BencodeHandler(object):
    def bdecode(self, s: bytes) -> Union[dict, None]:
        try:
            r, l = self.__decode_func[s[0]](s)
            ret, trail = self.__decode_func[s[0]](s)
        except (IndexError, KeyError, ValueError) as e:
            logging.debug("Not a valid bencoded string: %s", e)
            logging.warning("Not a valid bencoded string: %s", e)
            return None
        if l != b'':
            logging.debug("Invalid bencoded value (data after valid prefix)")
        if trail != b'':
            logging.warning("Invalid bencoded value (data after valid prefix)")
            return None
        return r
        return ret
							
							
							
						
 
							
							
							
						
@@ -0,0 +1,139 @@
import os
import logging
from typing import Dict, Union
from . import exiftool
from . import subprocess
class AbstractFFmpegParser(exiftool.ExiftoolParser):
    """ Abstract parser for all FFmpeg-based ones, mainly for video. """
    # Some fileformats have mandatory metadata fields
    meta_key_value_whitelist = {}  # type: Dict[str, Union[str, int]]
    def remove_all(self) -> bool:
        if self.meta_key_value_whitelist:
            logging.warning('The format of "%s" (%s) has some mandatory '
                            'metadata fields; mat2 filled them with standard '
                            'data.', self.filename, ', '.join(self.mimetypes))
        cmd = [_get_ffmpeg_path(),
               '-i', self.filename,      # input file
               '-y',                     # overwrite existing output file
               '-map', '0',              # copy everything all streams from input to output
               '-codec', 'copy',         # don't decode anything, just copy (speed!)
               '-loglevel', 'panic',     # Don't show log
               '-hide_banner',           # hide the banner
               '-map_metadata', '-1',    # remove supperficial metadata
               '-map_chapters', '-1',    # remove chapters
               '-disposition', '0',      # Remove dispositions (check ffmpeg's manpage)
               '-fflags', '+bitexact',   # don't add any metadata
               '-flags:v', '+bitexact',  # don't add any metadata
               '-flags:a', '+bitexact',  # don't add any metadata
               self.output_filename]
        try:
            subprocess.run(cmd, check=True,
                           input_filename=self.filename,
                           output_filename=self.output_filename)
        except subprocess.CalledProcessError as e:
            logging.error("Something went wrong during the processing of %s: %s", self.filename, e)
            return False
        return True
    def get_meta(self) -> Dict[str, Union[str, dict]]:
        meta = super().get_meta()
        ret = dict()  # type: Dict[str, Union[str, dict]]
        for key, value in meta.items():
            if key in self.meta_key_value_whitelist.keys():
                if value == self.meta_key_value_whitelist[key]:
                    continue
            ret[key] = value
        return ret
class WMVParser(AbstractFFmpegParser):
    mimetypes = {'video/x-ms-wmv', }
    meta_whitelist = {'AudioChannels', 'AudioCodecID', 'AudioCodecName',
                      'ErrorCorrectionType', 'AudioSampleRate', 'DataPackets',
                      'Directory', 'Duration', 'ExifToolVersion',
                      'FileAccessDate', 'FileInodeChangeDate', 'FileLength',
                      'FileModifyDate', 'FileName', 'FilePermissions',
                      'FileSize', 'FileType', 'FileTypeExtension',
                      'FrameCount', 'FrameRate', 'ImageHeight', 'ImageSize',
                      'ImageWidth', 'MIMEType', 'MaxBitrate', 'MaxPacketSize',
                      'Megapixels', 'MinPacketSize', 'Preroll', 'SendDuration',
                      'SourceFile', 'StreamNumber', 'VideoCodecName', }
    meta_key_value_whitelist = {  # some metadata are mandatory :/
        'AudioCodecDescription': '',
        'CreationDate': '0000:00:00 00:00:00Z',
        'FileID': '00000000-0000-0000-0000-000000000000',
        'Flags': 2,  # FIXME: What is this? Why 2?
        'ModifyDate': '0000:00:00 00:00:00',
        'TimeOffset': '0 s',
        'VideoCodecDescription': '',
        'StreamType': 'Audio',
        }
class AVIParser(AbstractFFmpegParser):
    mimetypes = {'video/x-msvideo', }
    meta_whitelist = {'SourceFile', 'ExifToolVersion', 'FileName', 'Directory',
                      'FileSize', 'FileModifyDate', 'FileAccessDate',
                      'FileInodeChangeDate', 'FilePermissions', 'FileType',
                      'FileTypeExtension', 'MIMEType', 'FrameRate', 'MaxDataRate',
                      'FrameCount', 'StreamCount', 'StreamType', 'VideoCodec',
                      'VideoFrameRate', 'VideoFrameCount', 'Quality',
                      'SampleSize', 'BMPVersion', 'ImageWidth', 'ImageHeight',
                      'Planes', 'BitDepth', 'Compression', 'ImageLength',
                      'PixelsPerMeterX', 'PixelsPerMeterY', 'NumColors',
                      'NumImportantColors', 'NumColors', 'NumImportantColors',
                      'RedMask', 'GreenMask', 'BlueMask', 'AlphaMask',
                      'ColorSpace', 'AudioCodec', 'AudioCodecRate',
                      'AudioSampleCount', 'AudioSampleCount',
                      'AudioSampleRate', 'Encoding', 'NumChannels',
                      'SampleRate', 'AvgBytesPerSec', 'BitsPerSample',
                      'Duration', 'ImageSize', 'Megapixels'}
class MP4Parser(AbstractFFmpegParser):
    mimetypes = {'video/mp4', }
    meta_whitelist = {'AudioFormat', 'AvgBitrate', 'Balance', 'TrackDuration',
                      'XResolution', 'YResolution', 'ExifToolVersion',
                      'FileAccessDate', 'FileInodeChangeDate', 'FileModifyDate',
                      'FileName', 'FilePermissions', 'MIMEType', 'FileType',
                      'FileTypeExtension', 'Directory', 'ImageWidth',
                      'ImageSize', 'ImageHeight', 'FileSize', 'SourceFile',
                      'BitDepth', 'Duration', 'AudioChannels',
                      'AudioBitsPerSample', 'AudioSampleRate', 'Megapixels',
                      'MovieDataSize', 'VideoFrameRate', 'MediaTimeScale',
                      'SourceImageHeight', 'SourceImageWidth',
                      'MatrixStructure', 'MediaDuration'}
    meta_key_value_whitelist = {  # some metadata are mandatory :/
        'CreateDate': '0000:00:00 00:00:00',
        'CurrentTime': '0 s',
        'MediaCreateDate': '0000:00:00 00:00:00',
        'MediaLanguageCode': 'und',
        'MediaModifyDate': '0000:00:00 00:00:00',
        'ModifyDate': '0000:00:00 00:00:00',
        'OpColor': '0 0 0',
        'PosterTime': '0 s',
        'PreferredRate': '1',
        'PreferredVolume': '100.00%',
        'PreviewDuration': '0 s',
        'PreviewTime': '0 s',
        'SelectionDuration': '0 s',
        'SelectionTime': '0 s',
        'TrackCreateDate': '0000:00:00 00:00:00',
        'TrackModifyDate': '0000:00:00 00:00:00',
        'TrackVolume': '0.00%',
    }
def _get_ffmpeg_path() -> str:  # pragma: no cover
    ffmpeg_path = '/usr/bin/ffmpeg'
    if os.path.isfile(ffmpeg_path):
        if os.access(ffmpeg_path, os.X_OK):
            return ffmpeg_path
    raise RuntimeError("Unable to find ffmpeg")
							
							
							
						
@@ -1,22 +1,30 @@
#!/usr/bin/python3
#!/usr/bin/env python3
import os
from typing import Tuple
from typing import Tuple, Generator, List, Union
import sys
import itertools
import mimetypes
import argparse
import multiprocessing
import logging
import unicodedata
try:
    from libmat2 import parser_factory, unsupported_extensions
    from libmat2 import parser_factory, UNSUPPORTED_EXTENSIONS
    from libmat2 import check_dependencies, UnknownMemberPolicy
except ValueError as e:
    print(e)
    sys.exit(1)
__version__ = '0.1.3'
__version__ = '0.7.0'
def __check_file(filename: str, mode: int = os.R_OK) -> bool:
# Make pyflakes happy
assert Tuple
assert Union
logging.basicConfig(format='%(levelname)s: %(message)s', level=logging.WARNING)
def __check_file(filename: str, mode: int=os.R_OK) -> bool:
    if not os.path.exists(filename):
        print("[-] %s is doesn't exist." % filename)
        return False
							
							
							
								
							
						
@@ -29,19 +37,25 @@ def __check_file(filename: str, mode: int = os.R_OK) -> bool:
    return True
def create_arg_parser():
def create_arg_parser() -> argparse.ArgumentParser:
    parser = argparse.ArgumentParser(description='Metadata anonymisation toolkit 2')
    parser.add_argument('files', nargs='*')
    parser.add_argument('files', nargs='*', help='the files to process')
    parser.add_argument('-v', '--version', action='version',
                        version='MAT2 %s' % __version__)
    parser.add_argument('-l', '--list', action='store_true',
                        help='list all supported fileformats')
    parser.add_argument('--check-dependencies', action='store_true',
                        help='check if MAT2 has all the dependencies it needs')
    parser.add_argument('-V', '--verbose', action='store_true',
                        help='show more verbose status information')
    parser.add_argument('--unknown-members', metavar='policy', default='abort',
                        help='how to handle unknown members of archive-style files (policy should' +
                        ' be one of: %s)' % ', '.join(p.value for p in UnknownMemberPolicy))
    info = parser.add_mutually_exclusive_group()
    info.add_argument('-c', '--check', action='store_true',
                      help='check if a file is free of harmful metadatas')
    info.add_argument('-s', '--show', action='store_true',
                      help='list all the harmful metadata of a file without removing them')
                      help='list harmful metadata detectable by MAT2 without removing them')
    info.add_argument('-L', '--lightweight', action='store_true',
                      help='remove SOME metadata')
    return parser
							
							
							
								
							
						
@@ -55,16 +69,37 @@ def show_meta(filename: str):
    if p is None:
        print("[-] %s's format (%s) is not supported" % (filename, mtype))
        return
    __print_meta(filename, p.get_meta())
def __print_meta(filename: str, metadata: dict, depth: int=1):
    padding = " " * depth*2
    if not metadata:
        print(padding + "No metadata found")
        return
    print("[%s] Metadata for %s:" % ('+'*depth, filename))
    for (k, v) in sorted(metadata.items()):
        if isinstance(v, dict):
            __print_meta(k, v, depth+1)
            continue
        # Remove control characters
        # We might use 'Cc' instead of 'C', but better safe than sorry
        # https://www.unicode.org/reports/tr44/#GC_Values_Table
        try:
            v = ''.join(ch for ch in v if not unicodedata.category(ch).startswith('C'))
        except TypeError:
            pass  # for things that aren't iterable
    print("[+] Metadata for %s:" % filename)
    for k, v in p.get_meta().items():
        try:  # FIXME this is ugly.
            print("  %s: %s" % (k, v))
            print(padding + "  %s: %s" % (k, v))
        except UnicodeEncodeError:
            print("  %s: harmful content" % k)
            print(padding + "  %s: harmful content" % k)
def clean_meta(params: Tuple[str, bool]) -> bool:
    filename, is_lightweigth = params
def clean_meta(filename: str, is_lightweight: bool, policy: UnknownMemberPolicy) -> bool:
    if not __check_file(filename, os.R_OK|os.W_OK):
        return False
							
							
							
								
							
						
@@ -72,29 +107,35 @@ def clean_meta(params: Tuple[str, bool]) -> bool:
    if p is None:
        print("[-] %s's format (%s) is not supported" % (filename, mtype))
        return False
    if is_lightweigth:
        return p.remove_all_lightweight()
    return p.remove_all()
    p.unknown_member_policy = policy
    p.lightweight_cleaning = is_lightweight
    try:
        return p.remove_all()
    except RuntimeError as e:
        print("[-] %s can't be cleaned: %s" % (filename, e))
    return False
def show_parsers():
    print('[+] Supported formats:')
    formats = list()
    for parser in parser_factory._get_parsers():
    formats = set()  # Set[str]
    for parser in parser_factory._get_parsers():  # type: ignore
        for mtype in parser.mimetypes:
            extensions = set()
            extensions = set()  # Set[str]
            for extension in mimetypes.guess_all_extensions(mtype):
                if extension[1:] not in unsupported_extensions:  # skip the dot
                if extension not in UNSUPPORTED_EXTENSIONS:
                    extensions.add(extension)
            if not extensions:
                # we're not supporting a single extension in the current
                # mimetype, so there is not point in showing the mimetype at all
                continue
            formats.append('  - %s (%s)' % (mtype, ', '.join(extensions)))
            formats.add('  - %s (%s)' % (mtype, ', '.join(extensions)))
    print('\n'.join(sorted(formats)))
def __get_files_recursively(files):
def __get_files_recursively(files: List[str]) -> Generator[str, None, None]:
    for f in files:
        if os.path.isdir(f):
            for path, _, _files in os.walk(f):
							
							
							
								
							
						
@@ -105,14 +146,23 @@ def __get_files_recursively(files):
        elif __check_file(f):
            yield f
def main():
def main() -> int:
    arg_parser = create_arg_parser()
    args = arg_parser.parse_args()
    if args.verbose:
        logging.basicConfig(level=logging.INFO)
    if not args.files:
        if not args.list:
            return arg_parser.print_help()
        show_parsers()
        if args.list:
            show_parsers()
            return 0
        elif args.check_dependencies:
            print("Dependencies required for MAT2 %s:" % __version__)
            for key, value in sorted(check_dependencies().items()):
                print('- %s: %s' % (key, 'yes' if value else 'no'))
        else:
            arg_parser.print_help()
        return 0
    elif args.show:
							
							
							
								
							
						
@@ -121,12 +171,16 @@ def main():
        return 0
    else:
        p = multiprocessing.Pool()
        mode = (args.lightweight is True)
        l = zip(__get_files_recursively(args.files), itertools.repeat(mode))
        policy = UnknownMemberPolicy(args.unknown_members)
        if policy == UnknownMemberPolicy.KEEP:
            logging.warning('Keeping unknown member files may leak metadata in the resulting file!')
        no_failure = True
        for f in __get_files_recursively(args.files):
            if clean_meta(f, args.lightweight, policy) is False:
                no_failure = False
        return 0 if no_failure is True else -1
        ret = list(p.imap_unordered(clean_meta, list(l)))
        return 0 if all(ret) else -1
if __name__ == '__main__':
    sys.exit(main())
							
							
							
						
 
							
							
							
						
@@ -0,0 +1,12 @@
# mat2's Nautilus extension
# Dependencies
- Nautilus (now known as [Files](https://wiki.gnome.org/action/show/Apps/Files))
- [nautilus-python](https://gitlab.gnome.org/GNOME/nautilus-python) >= 2.10
# Installation
Simply copy the `mat2.py` file to `~/.local/share/nautilus-python/extensions`,
and launch Nautilus; you should now have a "Remove metadata" item in the
right-clic menu on supported files.
							
							
							
						
@@ -0,0 +1,244 @@
#!/usr/bin/env python3
"""
Because writing GUI is non-trivial (cf. https://0xacab.org/jvoisin/mat2/issues/3),
we decided to write a Nautilus extensions instead
(cf. https://0xacab.org/jvoisin/mat2/issues/2).
The code is a little bit convoluted because Gtk isn't thread-safe,
so we're not allowed to call anything Gtk-related outside of the main
thread, so we'll have to resort to using a `queue` to pass "messages" around.
"""
# pylint: disable=no-name-in-module,unused-argument,no-self-use,import-error
import queue
import threading
from typing import Tuple, Optional, List
from urllib.parse import unquote
import gi
gi.require_version('Nautilus', '3.0')
gi.require_version('Gtk', '3.0')
gi.require_version('GdkPixbuf', '2.0')
from gi.repository import Nautilus, GObject, Gtk, Gio, GLib, GdkPixbuf
from libmat2 import parser_factory
def _remove_metadata(fpath) -> Tuple[bool, Optional[str]]:
    """ This is a simple wrapper around libmat2, because it's
    easier and cleaner this way.
    """
    parser, mtype = parser_factory.get_parser(fpath)
    if parser is None:
        return False, mtype
    return parser.remove_all(), mtype
class Mat2Extension(GObject.GObject, Nautilus.MenuProvider, Nautilus.LocationWidgetProvider):
    """ This class adds an item to the right-clic menu in Nautilus. """
    def __init__(self):
        super().__init__()
        self.infobar_hbox = None
        self.infobar = None
        self.failed_items = list()
    def __infobar_failure(self):
        """ Add an hbox to the `infobar` warning about the fact that we didn't
        manage to remove the metadata from every single file.
        """
        self.infobar.set_show_close_button(True)
        self.infobar_hbox = Gtk.Box(orientation=Gtk.Orientation.HORIZONTAL)
        btn = Gtk.Button("Show")
        btn.connect("clicked", self.__cb_show_failed)
        self.infobar_hbox.pack_end(btn, False, False, 0)
        infobar_msg = Gtk.Label("Failed to clean some items")
        self.infobar_hbox.pack_start(infobar_msg, False, False, 0)
        self.infobar.get_content_area().pack_start(self.infobar_hbox, True, True, 0)
        self.infobar.show_all()
    def get_widget(self, uri, window) -> Gtk.Widget:
        """ This is the method that we have to implement (because we're
        a LocationWidgetProvider) in order to show our infobar.
        """
        self.infobar = Gtk.InfoBar()
        self.infobar.set_message_type(Gtk.MessageType.ERROR)
        self.infobar.connect("response", self.__cb_infobar_response)
        return self.infobar
    def __cb_infobar_response(self, infobar, response):
        """ Callback for the infobar close button.
        """
        if response == Gtk.ResponseType.CLOSE:
            self.infobar_hbox.destroy()
            self.infobar.hide()
    def __cb_show_failed(self, button):
        """ Callback to show a popup containing a list of files
        that we didn't manage to clean.
        """
        # FIXME this should be done only once the window is destroyed
        self.infobar_hbox.destroy()
        self.infobar.hide()
        window = Gtk.Window()
        headerbar = Gtk.HeaderBar()
        window.set_titlebar(headerbar)
        headerbar.props.title = "Metadata removal failed"
        close_buton = Gtk.Button("Close")
        close_buton.connect("clicked", lambda _: window.close())
        headerbar.pack_end(close_buton)
        box = Gtk.Box(orientation=Gtk.Orientation.VERTICAL)
        window.add(box)
        box.add(self.__create_treeview())
        window.show_all()
    @staticmethod
    def __validate(fileinfo) -> Tuple[bool, str]:
        """ Validate if a given file FileInfo `fileinfo` can be processed.
        Returns a boolean, and a textreason why"""
        if fileinfo.get_uri_scheme() != "file" or fileinfo.is_directory():
            return False, "Not a file"
        elif not fileinfo.can_write():
            return False, "Not writeable"
        return True, ""
    def __create_treeview(self) -> Gtk.TreeView:
        liststore = Gtk.ListStore(GdkPixbuf.Pixbuf, str, str)
        treeview = Gtk.TreeView(model=liststore)
        renderer_pixbuf = Gtk.CellRendererPixbuf()
        column_pixbuf = Gtk.TreeViewColumn("Icon", renderer_pixbuf, pixbuf=0)
        treeview.append_column(column_pixbuf)
        for idx, name in enumerate(['File', 'Reason']):
            renderer_text = Gtk.CellRendererText()
            column_text = Gtk.TreeViewColumn(name, renderer_text, text=idx+1)
            treeview.append_column(column_text)
        for (fname, mtype, reason) in self.failed_items:
            # This part is all about adding mimetype icons to the liststore
            icon = Gio.content_type_get_icon('text/plain' if not mtype else mtype)
            # in case we don't have the corresponding icon,
            # we're adding `text/plain`, because we have this one for sure™
            names = icon.get_names() + ['text/plain', ]
            icon_theme = Gtk.IconTheme.get_default()
            for name in names:
                try:
                    img = icon_theme.load_icon(name, Gtk.IconSize.BUTTON, 0)
                    break
                except GLib.GError:
                    pass
            liststore.append([img, fname, reason])
        treeview.show_all()
        return treeview
    def __create_progressbar(self) -> Gtk.ProgressBar:
        """ Create the progressbar used to notify that files are currently
        being processed.
        """
        self.infobar.set_show_close_button(False)
        self.infobar.set_message_type(Gtk.MessageType.INFO)
        self.infobar_hbox = Gtk.Box(orientation=Gtk.Orientation.HORIZONTAL)
        progressbar = Gtk.ProgressBar()
        self.infobar_hbox.pack_start(progressbar, True, True, 0)
        progressbar.set_show_text(True)
        self.infobar.get_content_area().pack_start(self.infobar_hbox, True, True, 0)
        self.infobar.show_all()
        return progressbar
    def __update_progressbar(self, processing_queue, progressbar) -> bool:
        """ This method is run via `Glib.add_idle` to update the progressbar."""
        try:
            fname = processing_queue.get(block=False)
        except queue.Empty:
            return True
        # `None` is the marker put in the queue to signal that every selected
        # file was processed.
        if fname is None:
            self.infobar_hbox.destroy()
            self.infobar.hide()
            if len(self.failed_items):
                self.__infobar_failure()
            if not processing_queue.empty():
                print("Something went wrong, the queue isn't empty :/")
            return False
        progressbar.pulse()
        progressbar.set_text("Cleaning %s" % fname)
        progressbar.show_all()
        self.infobar_hbox.show_all()
        self.infobar.show_all()
        return True
    def __clean_files(self, files: list, processing_queue: queue.Queue) -> bool:
        """ This method is threaded in order to avoid blocking the GUI
        while cleaning up the files.
        """
        for fileinfo in files:
            fname = fileinfo.get_name()
            processing_queue.put(fname)
            valid, reason = self.__validate(fileinfo)
            if not valid:
                self.failed_items.append((fname, None, reason))
                continue
            fpath = unquote(fileinfo.get_uri()[7:])  # `len('file://') = 7`
            success, mtype = _remove_metadata(fpath)
            if not success:
                self.failed_items.append((fname, mtype, 'Unsupported/invalid'))
        processing_queue.put(None)  # signal that we processed all the files
        return True
    def __cb_menu_activate(self, menu, files):
        """ This method is called when the user clicked the "clean metadata"
        menu item.
        """
        self.failed_items = list()
        progressbar = self.__create_progressbar()
        progressbar.set_pulse_step = 1.0 / len(files)
        self.infobar.show_all()
        processing_queue = queue.Queue()
        GLib.idle_add(self.__update_progressbar, processing_queue, progressbar)
        thread = threading.Thread(target=self.__clean_files, args=(files, processing_queue))
        thread.daemon = True
        thread.start()
    def get_background_items(self, window, file):
        """ https://bugzilla.gnome.org/show_bug.cgi?id=784278 """
        return None
    def get_file_items(self, window, files) -> Optional[List[Nautilus.MenuItem]]:
        """ This method is the one allowing us to create a menu item.
        """
        # Do not show the menu item if not a single file has a chance to be
        # processed by mat2.
        if not any([is_valid for (is_valid, _) in map(self.__validate, files)]):
            return None
        item = Nautilus.MenuItem(
            name="MAT2::Remove_metadata",
            label="Remove metadata",
            tip="Remove metadata"
        )
        item.connect('activate', self.__cb_menu_activate, files)
        return [item, ]
							
							
							
						
@@ -1,29 +0,0 @@
#!/usr/bin/env python3
import gi
gi.require_version('Nautilus', '3.0')
from gi.repository import Nautilus, GObject
class ColumnExtension(GObject.GObject, Nautilus.MenuProvider):
    def menu_activate_cb(self, menu, file):
        print "menu_activate_cb", file
        # TODO: clean metadata here
    def get_background_items(self, window, file):
        """ https://bugzilla.gnome.org/show_bug.cgi?id=784278 """
        return None
    def get_file_items(self, window, files):
        if len(files) != 1:  # we're not supporting multiple files for now
            return
        file = files[0]
        item = Nautilus.MenuItem(
            name="MAT2::Remove_metadata",
            label="Remove metadata from %s" % file.get_name(),
            tip="Remove metadata from %s" % file.get_name()
        )
        item.connect('activate', self.menu_activate_cb, file)
        return [item]
							
							
								
							
							
						
@@ -5,7 +5,7 @@ with open("README.md", "r") as fh:
setuptools.setup(
    name="mat2",
    version='0.1.3',
    version='0.7.0',
    author="Julien (jvoisin) Voisin",
    author_email="julien.voisin+mat2@dustri.org",
    description="A handy tool to trash your metadata",
							
							
							
								
							
						
@@ -20,7 +20,7 @@ setuptools.setup(
        'pycairo',
    ],
    packages=setuptools.find_packages(exclude=('tests', )),
    classifiers=(
    classifiers=[
        "Development Status :: 3 - Alpha",
        "Environment :: Console",
        "License :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)",
							
							
							
								
							
						
@@ -28,7 +28,7 @@ setuptools.setup(
        "Programming Language :: Python :: 3 :: Only",
        "Topic :: Security",
        "Intended Audience :: End Users/Desktop",
    ),
    ],
    project_urls={
        'bugtacker': 'https://0xacab.org/jvoisin/mat2/issues',
    },
							
								
							
							
							
						
 
			
				
					Side by Side
					
				
			
			
				
					
						
						
						
							After
							
							
								
									Width: 
									 | 
									Height: 
									 | 
								
								Size: 1.9 KiB
							
						
						
					
				
				
			
		
			
				
					Side by Side
					
				
			
			
				
					
						
						
						
							After
							
							
								
									Width: 
									 | 
									Height: 
									 | 
								
								Size: 1.1 KiB
							
						
						
					
				
				
			
		
							
							
							
						
@@ -0,0 +1,14 @@
<html>
	<head>
		<meta content="vim" name="generator"/>
		<meta content="jvoisin" name="author"/>
</head>
<body>
	<p>
		<h1>Hello</h1>
		I am a web page.
		Please <b>love</b> me.
		Here, have a pretty picture: <img src='dirty.jpg' alt='a pretty picture'/>
	</p>
</body>
</html>
							
							
								
							
							
						
@@ -4,62 +4,102 @@ import subprocess
import unittest
mat2_binary = ['./mat2']
if 'MAT2_GLOBAL_PATH_TESTSUITE' in os.environ:
    # Debian runs tests after installing the package
    # https://0xacab.org/jvoisin/mat2/issues/16#note_153878
    mat2_binary = ['/usr/bin/env', 'mat2']
class TestHelp(unittest.TestCase):
    def test_help(self):
        proc = subprocess.Popen(['./mat2', '--help'], stdout=subprocess.PIPE)
        proc = subprocess.Popen(mat2_binary + ['--help'], stdout=subprocess.PIPE)
        stdout, _ = proc.communicate()
        self.assertIn(b'usage: mat2 [-h] [-v] [-l] [-c | -s | -L] [files [files ...]]', stdout)
        self.assertIn(b'usage: mat2 [-h] [-v] [-l] [--check-dependencies] [-V]',
                      stdout)
        self.assertIn(b'[--unknown-members policy] [-s | -L]', stdout)
    def test_no_arg(self):
        proc = subprocess.Popen(['./mat2'], stdout=subprocess.PIPE)
        proc = subprocess.Popen(mat2_binary, stdout=subprocess.PIPE)
        stdout, _ = proc.communicate()
        self.assertIn(b'usage: mat2 [-h] [-v] [-l] [-c | -s | -L] [files [files ...]]', stdout)
        self.assertIn(b'usage: mat2 [-h] [-v] [-l] [--check-dependencies] [-V]',
                      stdout)
        self.assertIn(b'[--unknown-members policy] [-s | -L]', stdout)
class TestVersion(unittest.TestCase):
    def test_version(self):
        proc = subprocess.Popen(['./mat2', '--version'], stdout=subprocess.PIPE)
        proc = subprocess.Popen(mat2_binary + ['--version'], stdout=subprocess.PIPE)
        stdout, _ = proc.communicate()
        self.assertTrue(stdout.startswith(b'MAT2 '))
class TestExclusiveArgs(unittest.TestCase):
    def test_version(self):
        proc = subprocess.Popen(['./mat2', '-s', '-c'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        stdout, stderr = proc.communicate()
        self.assertIn(b'mat2: error: argument -c/--check: not allowed with argument -s/--show', stderr)
class TestDependencies(unittest.TestCase):
    def test_dependencies(self):
        proc = subprocess.Popen(mat2_binary + ['--check-dependencies'], stdout=subprocess.PIPE)
        stdout, _ = proc.communicate()
        self.assertTrue(b'MAT2' in stdout)
class TestReturnValue(unittest.TestCase):
    def test_nonzero(self):
        ret = subprocess.call(['./mat2', './mat2'], stdout=subprocess.DEVNULL)
        ret = subprocess.call(mat2_binary + ['mat2'], stdout=subprocess.DEVNULL)
        self.assertEqual(255, ret)
        ret = subprocess.call(['./mat2', '--whololo'], stderr=subprocess.DEVNULL)
        ret = subprocess.call(mat2_binary + ['--whololo'], stderr=subprocess.DEVNULL)
        self.assertEqual(2, ret)
    def test_zero(self):
        ret = subprocess.call(['./mat2'], stdout=subprocess.DEVNULL)
        ret = subprocess.call(mat2_binary, stdout=subprocess.DEVNULL)
        self.assertEqual(0, ret)
        ret = subprocess.call(['./mat2', '--show', './mat2'], stdout=subprocess.DEVNULL)
        ret = subprocess.call(mat2_binary + ['--show', 'mat2'], stdout=subprocess.DEVNULL)
        self.assertEqual(0, ret)
class TestCleanFolder(unittest.TestCase):
    def test_jpg(self):
        try:
            os.mkdir('./tests/data/folder/')
        except FileExistsError:
            pass
        shutil.copy('./tests/data/dirty.jpg', './tests/data/folder/clean1.jpg')
        shutil.copy('./tests/data/dirty.jpg', './tests/data/folder/clean2.jpg')
        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/folder/'],
                stdout=subprocess.PIPE)
        stdout, _ = proc.communicate()
        self.assertIn(b'Comment: Created with GIMP', stdout)
        proc = subprocess.Popen(mat2_binary + ['./tests/data/folder/'],
                stdout=subprocess.PIPE)
        stdout, _ = proc.communicate()
        os.remove('./tests/data/folder/clean1.jpg')
        os.remove('./tests/data/folder/clean2.jpg')
        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/folder/'],
                stdout=subprocess.PIPE)
        stdout, _ = proc.communicate()
        self.assertNotIn(b'Comment: Created with GIMP', stdout)
        self.assertIn(b'No metadata found', stdout)
        shutil.rmtree('./tests/data/folder/')
class TestCleanMeta(unittest.TestCase):
    def test_jpg(self):
        shutil.copy('./tests/data/dirty.jpg', './tests/data/clean.jpg')
        proc = subprocess.Popen(['./mat2', '--show', './tests/data/clean.jpg'],
        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/clean.jpg'],
                stdout=subprocess.PIPE)
        stdout, _ = proc.communicate()
        self.assertIn(b'Comment: Created with GIMP', stdout)
        proc = subprocess.Popen(['./mat2', './tests/data/clean.jpg'],
        proc = subprocess.Popen(mat2_binary + ['./tests/data/clean.jpg'],
                stdout=subprocess.PIPE)
        stdout, _ = proc.communicate()
        proc = subprocess.Popen(['./mat2', '--show', './tests/data/clean.cleaned.jpg'],
        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/clean.cleaned.jpg'],
                stdout=subprocess.PIPE)
        stdout, _ = proc.communicate()
        self.assertNotIn(b'Comment: Created with GIMP', stdout)
							
							
							
								
							
						
@@ -69,32 +109,34 @@ class TestCleanMeta(unittest.TestCase):
class TestIsSupported(unittest.TestCase):
    def test_pdf(self):
        proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.pdf'],
        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.pdf'],
                stdout=subprocess.PIPE)
        stdout, _ = proc.communicate()
        self.assertNotIn(b"isn't supported", stdout)
class TestGetMeta(unittest.TestCase):
    maxDiff = None
    def test_pdf(self):
        proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.pdf'],
        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.pdf'],
                stdout=subprocess.PIPE)
        stdout, _ = proc.communicate()
        self.assertIn(b'producer: pdfTeX-1.40.14', stdout)
        self.assertIn(b'Producer: pdfTeX-1.40.14', stdout)
    def test_png(self):
        proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.png'],
        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.png'],
                stdout=subprocess.PIPE)
        stdout, _ = proc.communicate()
        self.assertIn(b'Comment: This is a comment, be careful!', stdout)
    def test_jpg(self):
        proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.jpg'],
        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.jpg'],
                stdout=subprocess.PIPE)
        stdout, _ = proc.communicate()
        self.assertIn(b'Comment: Created with GIMP', stdout)
    def test_docx(self):
        proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.docx'],
        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.docx'],
                stdout=subprocess.PIPE)
        stdout, _ = proc.communicate()
        self.assertIn(b'Application: LibreOffice/5.4.5.1$Linux_X86_64', stdout)
							
							
							
								
							
						
@@ -102,7 +144,7 @@ class TestGetMeta(unittest.TestCase):
        self.assertIn(b'revision: 1', stdout)
    def test_odt(self):
        proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.odt'],
        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.odt'],
                stdout=subprocess.PIPE)
        stdout, _ = proc.communicate()
        self.assertIn(b'generator: LibreOffice/3.3$Unix', stdout)
							
							
							
								
							
						
@@ -110,25 +152,32 @@ class TestGetMeta(unittest.TestCase):
        self.assertIn(b'date_time: 2011-07-26 02:40:16', stdout)
    def test_mp3(self):
        proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.mp3'],
        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.mp3'],
                stdout=subprocess.PIPE)
        stdout, _ = proc.communicate()
        self.assertIn(b'TALB: harmfull', stdout)
        self.assertIn(b'COMM::: Thank you for using MAT !', stdout)
    def test_flac(self):
        proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.flac'],
                stdout=subprocess.PIPE)
        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.flac'],
                stdout=subprocess.PIPE, bufsize=0)
        stdout, _ = proc.communicate()
        self.assertIn(b'comments: Thank you for using MAT !', stdout)
        self.assertIn(b'genre: Python', stdout)
        self.assertIn(b'title: I am so', stdout)
    def test_ogg(self):
        proc = subprocess.Popen(['./mat2', '--show', './tests/data/dirty.ogg'],
        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/dirty.ogg'],
                stdout=subprocess.PIPE)
        stdout, _ = proc.communicate()
        self.assertIn(b'comments: Thank you for using MAT !', stdout)
        self.assertIn(b'genre: Python', stdout)
        self.assertIn(b'i am a : various comment', stdout)
        self.assertIn(b'artist: jvoisin', stdout)
class TestControlCharInjection(unittest.TestCase):
    def test_jpg(self):
        proc = subprocess.Popen(mat2_binary + ['--show', './tests/data/control_chars.jpg'],
                stdout=subprocess.PIPE)
        stdout, _ = proc.communicate()
        self.assertIn(b'Comment: GQ\n', stdout)
							
							
							
						
 
							
							
							
						
@@ -1,11 +1,54 @@
#!/usr/bin/python3
#!/usr/bin/env python3
import unittest
import shutil
import os
import logging
import zipfile
from libmat2 import pdf, images, audio, office, parser_factory, torrent, harmless
from libmat2 import pdf, images, audio, office, parser_factory, torrent
from libmat2 import harmless, video, html
# No need to logging messages, should something go wrong,
# the testsuite _will_ fail.
logger = logging.getLogger()
logger.setLevel(logging.FATAL)
class TestInexistentFiles(unittest.TestCase):
    def test_ro(self):
        parser, mimetype = parser_factory.get_parser('/etc/passwd')
        self.assertEqual(mimetype, None)
        self.assertEqual(parser, None)
    def test_notaccessible(self):
        parser, mimetype = parser_factory.get_parser('/etc/shadow')
        self.assertEqual(mimetype, None)
        self.assertEqual(parser, None)
    def test_folder(self):
        parser, mimetype = parser_factory.get_parser('./tests/')
        self.assertEqual(mimetype, None)
        self.assertEqual(parser, None)
    def test_inexistingfile(self):
        parser, mimetype = parser_factory.get_parser('./tests/NONEXISTING_FILE')
        self.assertEqual(mimetype, None)
        self.assertEqual(parser, None)
    def test_chardevice(self):
        parser, mimetype = parser_factory.get_parser('/dev/zero')
        self.assertEqual(mimetype, None)
        self.assertEqual(parser, None)
    def test_brokensymlink(self):
        shutil.copy('./tests/test_libmat2.py', './tests/clean.py')
        os.symlink('./tests/clean.py', './tests/SYMLINK')
        os.remove('./tests/clean.py')
        parser, mimetype = parser_factory.get_parser('./tests/SYMLINK')
        self.assertEqual(mimetype, None)
        self.assertEqual(parser, None)
        os.unlink('./tests/SYMLINK')
class TestUnsupportedFiles(unittest.TestCase):
    def test_pdf(self):
							
							
							
								
							
						
@@ -15,6 +58,21 @@ class TestUnsupportedFiles(unittest.TestCase):
        self.assertEqual(parser, None)
        os.remove('./tests/clean.py')
class TestCorruptedEmbedded(unittest.TestCase):
    def test_docx(self):
        shutil.copy('./tests/data/embedded_corrupted.docx', './tests/data/clean.docx')
        parser, _ = parser_factory.get_parser('./tests/data/clean.docx')
        self.assertFalse(parser.remove_all())
        self.assertIsNotNone(parser.get_meta())
        os.remove('./tests/data/clean.docx')
    def test_odt(self):
        shutil.copy('./tests/data/embedded_corrupted.odt', './tests/data/clean.odt')
        parser, _ = parser_factory.get_parser('./tests/data/clean.odt')
        self.assertFalse(parser.remove_all())
        self.assertTrue(parser.get_meta())
        os.remove('./tests/data/clean.odt')
class TestExplicitelyUnsupportedFiles(unittest.TestCase):
    def test_pdf(self):
							
							
							
								
							
						
@@ -25,6 +83,26 @@ class TestExplicitelyUnsupportedFiles(unittest.TestCase):
        os.remove('./tests/data/clean.py')
class TestWrongContentTypesFileOffice(unittest.TestCase):
    def test_office_incomplete(self):
        shutil.copy('./tests/data/malformed_content_types.docx', './tests/data/clean.docx')
        p = office.MSOfficeParser('./tests/data/clean.docx')
        self.assertIsNotNone(p)
        self.assertFalse(p.remove_all())
        os.remove('./tests/data/clean.docx')
    def test_office_broken(self):
        shutil.copy('./tests/data/broken_xml_content_types.docx', './tests/data/clean.docx')
        with self.assertRaises(ValueError):
            office.MSOfficeParser('./tests/data/clean.docx')
        os.remove('./tests/data/clean.docx')
    def test_office_absent(self):
        shutil.copy('./tests/data/no_content_types.docx', './tests/data/clean.docx')
        with self.assertRaises(ValueError):
            office.MSOfficeParser('./tests/data/clean.docx')
        os.remove('./tests/data/clean.docx')
class TestCorruptedFiles(unittest.TestCase):
    def test_pdf(self):
        shutil.copy('./tests/data/dirty.png', './tests/data/clean.png')
							
							
							
								
							
						
@@ -40,21 +118,40 @@ class TestCorruptedFiles(unittest.TestCase):
    def test_png2(self):
        shutil.copy('./tests/test_libmat2.py', './tests/clean.png')
        parser, mimetype = parser_factory.get_parser('./tests/clean.png')
        parser, _ = parser_factory.get_parser('./tests/clean.png')
        self.assertIsNone(parser)
        os.remove('./tests/clean.png')
    def test_torrent(self):
        shutil.copy('./tests/data/dirty.png', './tests/data/clean.torrent')
        p = torrent.TorrentParser('./tests/data/clean.torrent')
        self.assertFalse(p.remove_all())
        expected = {'Unknown meta': 'Unable to parse torrent file "./tests/data/clean.torrent".'}
        self.assertEqual(p.get_meta(), expected)
        with self.assertRaises(ValueError):
            torrent.TorrentParser('./tests/data/clean.torrent')
        with open("./tests/data/clean.torrent", "a") as f:
            f.write("trailing garbage")
        p = torrent.TorrentParser('./tests/data/clean.torrent')
        self.assertEqual(p.get_meta(), expected)
        with self.assertRaises(ValueError):
            torrent.TorrentParser('./tests/data/clean.torrent')
        with open("./tests/data/clean.torrent", "w") as f:
            f.write("i-0e")
        with self.assertRaises(ValueError):
            torrent.TorrentParser('./tests/data/clean.torrent')
        with open("./tests/data/clean.torrent", "w") as f:
            f.write("i00e")
        with self.assertRaises(ValueError):
            torrent.TorrentParser('./tests/data/clean.torrent')
        with open("./tests/data/clean.torrent", "w") as f:
            f.write("01:AAAAAAAAA")
        with self.assertRaises(ValueError):
            torrent.TorrentParser('./tests/data/clean.torrent')
        with open("./tests/data/clean.torrent", "w") as f:
            f.write("1:aaa")
        with self.assertRaises(ValueError):
            torrent.TorrentParser('./tests/data/clean.torrent')
        os.remove('./tests/data/clean.torrent')
    def test_odg(self):
							
							
							
								
							
						
@@ -65,23 +162,110 @@ class TestCorruptedFiles(unittest.TestCase):
    def test_bmp(self):
        shutil.copy('./tests/data/dirty.png', './tests/data/clean.bmp')
        harmless.HarmlessParser('./tests/data/clean.bmp')
        ret = harmless.HarmlessParser('./tests/data/clean.bmp')
        self.assertIsNotNone(ret)
        os.remove('./tests/data/clean.bmp')
    def test_docx(self):
        shutil.copy('./tests/data/dirty.png', './tests/data/clean.docx')
        with self.assertRaises(ValueError):
             office.MSOfficeParser('./tests/data/clean.docx')
            office.MSOfficeParser('./tests/data/clean.docx')
        os.remove('./tests/data/clean.docx')
    def test_flac(self):
        shutil.copy('./tests/data/dirty.png', './tests/data/clean.flac')
        with self.assertRaises(ValueError):
             audio.FLACParser('./tests/data/clean.flac')
            audio.FLACParser('./tests/data/clean.flac')
        os.remove('./tests/data/clean.flac')
    def test_mp3(self):
        shutil.copy('./tests/data/dirty.png', './tests/data/clean.mp3')
        with self.assertRaises(ValueError):
             audio.MP3Parser('./tests/data/clean.mp3')
            audio.MP3Parser('./tests/data/clean.mp3')
        os.remove('./tests/data/clean.mp3')
    def test_jpg(self):
        shutil.copy('./tests/data/dirty.mp3', './tests/data/clean.jpg')
        with self.assertRaises(ValueError):
             images.JPGParser('./tests/data/clean.jpg')
        os.remove('./tests/data/clean.jpg')
    def test_png_lightweight(self):
        return
        shutil.copy('./tests/data/dirty.torrent', './tests/data/clean.png')
        p = images.PNGParser('./tests/data/clean.png')
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.png')
    def test_avi(self):
        try:
            video._get_ffmpeg_path()
        except RuntimeError:
            raise unittest.SkipTest
        shutil.copy('./tests/data/dirty.torrent', './tests/data/clean.avi')
        p = video.AVIParser('./tests/data/clean.avi')
        self.assertFalse(p.remove_all())
        os.remove('./tests/data/clean.avi')
    def test_avi_injection(self):
        try:
            video._get_ffmpeg_path()
        except RuntimeError:
            raise unittest.SkipTest
        shutil.copy('./tests/data/dirty.torrent', './tests/data/--output.avi')
        p = video.AVIParser('./tests/data/--output.avi')
        self.assertFalse(p.remove_all())
        os.remove('./tests/data/--output.avi')
    def test_zip(self):
        with zipfile.ZipFile('./tests/data/dirty.zip', 'w') as zout:
            zout.write('./tests/data/dirty.flac')
            zout.write('./tests/data/dirty.docx')
            zout.write('./tests/data/dirty.jpg')
            zout.write('./tests/data/embedded_corrupted.docx')
        p, mimetype = parser_factory.get_parser('./tests/data/dirty.zip')
        self.assertEqual(mimetype, 'application/zip')
        meta = p.get_meta()
        self.assertEqual(meta['tests/data/dirty.flac']['comments'], 'Thank you for using MAT !')
        self.assertEqual(meta['tests/data/dirty.docx']['word/media/image1.png']['Comment'], 'This is a comment, be careful!')
        self.assertFalse(p.remove_all())
        os.remove('./tests/data/dirty.zip')
    def test_html(self):
        shutil.copy('./tests/data/dirty.html', './tests/data/clean.html')
        with open('./tests/data/clean.html', 'a') as f:
            f.write('<open>but not</closed>')
        with self.assertRaises(ValueError):
            html.HTMLParser('./tests/data/clean.html')
        os.remove('./tests/data/clean.html')
        # Yes, we're able to deal with malformed html :/
        shutil.copy('./tests/data/dirty.html', './tests/data/clean.html')
        with open('./tests/data/clean.html', 'a') as f:
            f.write('<meta name=\'this" is="weird"/>')
        p = html.HTMLParser('./tests/data/clean.html')
        self.assertTrue(p.remove_all())
        p = html.HTMLParser('./tests/data/clean.cleaned.html')
        self.assertEqual(p.get_meta(), {})
        os.remove('./tests/data/clean.html')
        os.remove('./tests/data/clean.cleaned.html')
        with open('./tests/data/clean.html', 'w') as f:
            f.write('</close>')
        with self.assertRaises(ValueError):
            html.HTMLParser('./tests/data/clean.html')
        os.remove('./tests/data/clean.html')
        with open('./tests/data/clean.html', 'w') as f:
            f.write('<notclosed>')
        p = html.HTMLParser('./tests/data/clean.html')
        with self.assertRaises(ValueError):
            p.get_meta()
        p = html.HTMLParser('./tests/data/clean.html')
        with self.assertRaises(ValueError):
            p.remove_all()
        os.remove('./tests/data/clean.html')
							
							
							
						
 
							
							
							
						
@@ -0,0 +1,135 @@
#!/usr/bin/env python3
import unittest
import shutil
import os
import zipfile
import tempfile
from libmat2 import office, parser_factory
class TestZipMetadata(unittest.TestCase):
    def __check_deep_meta(self, p):
        tempdir = tempfile.mkdtemp()
        zipin = zipfile.ZipFile(p.filename)
        zipin.extractall(tempdir)
        for subdir, dirs, files in os.walk(tempdir):
            for f in files:
                complete_path = os.path.join(subdir, f)
                inside_p, _ = parser_factory.get_parser(complete_path)
                if inside_p is None:
                    continue
                self.assertEqual(inside_p.get_meta(), {})
        shutil.rmtree(tempdir)
    def __check_zip_meta(self, p):
        zipin = zipfile.ZipFile(p.filename)
        for item in zipin.infolist():
            self.assertEqual(item.comment, b'')
            self.assertEqual(item.date_time, (1980, 1, 1, 0, 0, 0))
            self.assertEqual(item.create_system, 3)  # 3 is UNIX
    def test_office(self):
        shutil.copy('./tests/data/dirty.docx', './tests/data/clean.docx')
        p = office.MSOfficeParser('./tests/data/clean.docx')
        meta = p.get_meta()
        self.assertIsNotNone(meta)
        self.assertEqual(meta['word/media/image1.png']['Comment'], 'This is a comment, be careful!')
        ret = p.remove_all()
        self.assertTrue(ret)
        p = office.MSOfficeParser('./tests/data/clean.cleaned.docx')
        self.assertEqual(p.get_meta(), {})
        self.__check_zip_meta(p)
        self.__check_deep_meta(p)
        os.remove('./tests/data/clean.docx')
        os.remove('./tests/data/clean.cleaned.docx')
    def test_libreoffice(self):
        shutil.copy('./tests/data/dirty.odt', './tests/data/clean.odt')
        p = office.LibreOfficeParser('./tests/data/clean.odt')
        meta = p.get_meta()
        self.assertIsNotNone(meta)
        ret = p.remove_all()
        self.assertTrue(ret)
        p = office.LibreOfficeParser('./tests/data/clean.cleaned.odt')
        self.assertEqual(p.get_meta(), {})
        self.__check_zip_meta(p)
        self.__check_deep_meta(p)
        os.remove('./tests/data/clean.odt')
        os.remove('./tests/data/clean.cleaned.odt')
class TestZipOrder(unittest.TestCase):
    def test_libreoffice(self):
        shutil.copy('./tests/data/dirty.odt', './tests/data/clean.odt')
        p = office.LibreOfficeParser('./tests/data/clean.odt')
        meta = p.get_meta()
        self.assertIsNotNone(meta)
        is_unordered = False
        with zipfile.ZipFile('./tests/data/clean.odt') as zin:
            previous_name = ''
            for item in zin.infolist():
                if previous_name == '':
                    previous_name = item.filename
                    continue
                elif item.filename < previous_name:
                    is_unordered = True
                    break
        self.assertTrue(is_unordered)
        ret = p.remove_all()
        self.assertTrue(ret)
        with zipfile.ZipFile('./tests/data/clean.cleaned.odt') as zin:
            previous_name = ''
            for item in zin.infolist():
                if previous_name == '':
                    previous_name = item.filename
                    continue
                self.assertGreaterEqual(item.filename, previous_name)
        os.remove('./tests/data/clean.odt')
        os.remove('./tests/data/clean.cleaned.odt')
class TestRsidRemoval(unittest.TestCase):
    def test_office(self):
        shutil.copy('./tests/data/office_revision_session_ids.docx', './tests/data/clean.docx')
        p = office.MSOfficeParser('./tests/data/clean.docx')
        meta = p.get_meta()
        self.assertIsNotNone(meta)
        how_many_rsid = False
        with zipfile.ZipFile('./tests/data/clean.docx') as zin:
            for item in zin.infolist():
                if not item.filename.endswith('.xml'):
                    continue
                num = zin.read(item).decode('utf-8').lower().count('w:rsid')
                how_many_rsid += num
        self.assertEqual(how_many_rsid, 11)
        ret = p.remove_all()
        self.assertTrue(ret)
        with zipfile.ZipFile('./tests/data/clean.cleaned.docx') as zin:
            for item in zin.infolist():
                if not item.filename.endswith('.xml'):
                    continue
                num = zin.read(item).decode('utf-8').lower().count('w:rsid')
                self.assertEqual(num, 0)
        os.remove('./tests/data/clean.docx')
        os.remove('./tests/data/clean.cleaned.docx')
							
							
							
						
@@ -1,12 +1,23 @@
#!/usr/bin/python3
#!/usr/bin/env python3
import unittest
import shutil
import os
import zipfile
import tempfile
from libmat2 import pdf, images, audio, office, parser_factory, torrent, harmless
from libmat2 import check_dependencies, video, archive, html
class TestCheckDependencies(unittest.TestCase):
    def test_deps(self):
        try:
            ret = check_dependencies()
        except RuntimeError:
            return   # this happens if not every dependency is installed
        for value in ret.values():
            self.assertTrue(value)
class TestParserFactory(unittest.TestCase):
							
							
							
								
							
						
@@ -26,6 +37,32 @@ class TestParameterInjection(unittest.TestCase):
        self.assertEqual(meta['ModifyDate'], "2018:03:20 21:59:25")
        os.remove('-ver')
    def test_ffmpeg_injection(self):
        try:
            video._get_ffmpeg_path()
        except RuntimeError:
            raise unittest.SkipTest
        shutil.copy('./tests/data/dirty.avi', './--output')
        p = video.AVIParser('--output')
        meta = p.get_meta()
        self.assertEqual(meta['Software'], 'MEncoder SVN-r33148-4.0.1')
        os.remove('--output')
    def test_ffmpeg_injection_complete_path(self):
        try:
            video._get_ffmpeg_path()
        except RuntimeError:
            raise unittest.SkipTest
        shutil.copy('./tests/data/dirty.avi', './tests/data/ --output.avi')
        p = video.AVIParser('./tests/data/ --output.avi')
        meta = p.get_meta()
        self.assertEqual(meta['Software'], 'MEncoder SVN-r33148-4.0.1')
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/ --output.avi')
        os.remove('./tests/data/ --output.cleaned.avi')
class TestUnsupportedEmbeddedFiles(unittest.TestCase):
    def test_odt_with_svg(self):
							
							
							
								
							
						
@@ -48,8 +85,8 @@ class TestGetMeta(unittest.TestCase):
        self.assertEqual(meta['producer'], 'pdfTeX-1.40.14')
        self.assertEqual(meta['creator'], "'Certified by IEEE PDFeXpress at 03/19/2016 2:56:07 AM'")
        self.assertEqual(meta['DocumentID'], "uuid:4a1a79c8-404e-4d38-9580-5bc081036e61")
        self.assertEqual(meta['PTEX.Fullbanner'], "This is pdfTeX, Version " \
                "3.1415926-2.5-1.40.14 (TeX Live 2013/Debian) kpathsea " \
        self.assertEqual(meta['PTEX.Fullbanner'], "This is pdfTeX, Version "
                "3.1415926-2.5-1.40.14 (TeX Live 2013/Debian) kpathsea "
                "version 6.1.1")
    def test_torrent(self):
							
								
							
							
								
							
							
						
@@ -89,20 +126,26 @@ class TestGetMeta(unittest.TestCase):
        p = audio.FLACParser('./tests/data/dirty.flac')
        meta = p.get_meta()
        self.assertEqual(meta['title'], 'I am so')
        self.assertEqual(meta['Cover 0'], {'Comment': 'Created with GIMP'})
    def test_docx(self):
        p = office.MSOfficeParser('./tests/data/dirty.docx')
        meta = p.get_meta()
        self.assertEqual(meta['cp:lastModifiedBy'], 'Julien Voisin')
        self.assertEqual(meta['dc:creator'], 'julien voisin')
        self.assertEqual(meta['Application'], 'LibreOffice/5.4.5.1$Linux_X86_64 LibreOffice_project/40m0$Build-1')
        self.assertEqual(meta['docProps/core.xml']['cp:lastModifiedBy'], 'Julien Voisin')
        self.assertEqual(meta['docProps/core.xml']['dc:creator'], 'julien voisin')
        self.assertEqual(meta['docProps/app.xml']['Application'], 'LibreOffice/5.4.5.1$Linux_X86_64 LibreOffice_project/40m0$Build-1')
    def test_libreoffice(self):
        p = office.LibreOfficeParser('./tests/data/dirty.odt')
        meta = p.get_meta()
        self.assertEqual(meta['meta:initial-creator'], 'jvoisin ')
        self.assertEqual(meta['meta:creation-date'], '2011-07-26T03:27:48')
        self.assertEqual(meta['meta:generator'], 'LibreOffice/3.3$Unix LibreOffice_project/330m19$Build-202')
        self.assertEqual(meta['meta.xml']['meta:initial-creator'], 'jvoisin ')
        self.assertEqual(meta['meta.xml']['meta:creation-date'], '2011-07-26T03:27:48')
        self.assertEqual(meta['meta.xml']['meta:generator'], 'LibreOffice/3.3$Unix LibreOffice_project/330m19$Build-202')
        p = office.LibreOfficeParser('./tests/data/weird_producer.odt')
        meta = p.get_meta()
        self.assertEqual(meta['mimetype']['create_system'], 'Windows')
        self.assertEqual(meta['mimetype']['comment'], b'YAY FOR COMMENTS')
    def test_txt(self):
        p, mimetype = parser_factory.get_parser('./tests/data/dirty.txt')
							
							
							
								
							
						
@@ -110,6 +153,29 @@ class TestGetMeta(unittest.TestCase):
        meta = p.get_meta()
        self.assertEqual(meta, {})
    def test_zip(self):
        with zipfile.ZipFile('./tests/data/dirty.zip', 'w') as zout:
            zout.write('./tests/data/dirty.flac')
            zout.write('./tests/data/dirty.docx')
            zout.write('./tests/data/dirty.jpg')
        p, mimetype = parser_factory.get_parser('./tests/data/dirty.zip')
        self.assertEqual(mimetype, 'application/zip')
        meta = p.get_meta()
        self.assertEqual(meta['tests/data/dirty.flac']['comments'], 'Thank you for using MAT !')
        self.assertEqual(meta['tests/data/dirty.docx']['word/media/image1.png']['Comment'], 'This is a comment, be careful!')
        os.remove('./tests/data/dirty.zip')
    def test_wmv(self):
        p, mimetype = parser_factory.get_parser('./tests/data/dirty.wmv')
        self.assertEqual(mimetype, 'video/x-ms-wmv')
        meta = p.get_meta()
        self.assertEqual(meta['EncodingSettings'], 'Lavf52.103.0')
    def test_gif(self):
        p, mimetype = parser_factory.get_parser('./tests/data/dirty.gif')
        self.assertEqual(mimetype, 'image/gif')
        meta = p.get_meta()
        self.assertEqual(meta['Comment'], 'this is a test comment')
class TestRemovingThumbnails(unittest.TestCase):
    def test_odt(self):
							
								
							
							
								
							
							
						
@@ -169,105 +235,6 @@ class TestRevisionsCleaning(unittest.TestCase):
        os.remove('./tests/data/revision_clean.docx')
        os.remove('./tests/data/revision_clean.cleaned.docx')
class TestDeepCleaning(unittest.TestCase):
    def __check_deep_meta(self, p):
        tempdir = tempfile.mkdtemp()
        zipin = zipfile.ZipFile(p.filename)
        zipin.extractall(tempdir)
        for subdir, dirs, files in os.walk(tempdir):
            for f in files:
                complete_path = os.path.join(subdir, f)
                inside_p, _ = parser_factory.get_parser(complete_path)
                if inside_p is None:
                    continue
                print('[+] %s is clean inside %s' %(complete_path, p.filename))
                self.assertEqual(inside_p.get_meta(), {})
        shutil.rmtree(tempdir)
    def __check_zip_meta(self, p):
        zipin = zipfile.ZipFile(p.filename)
        for item in zipin.infolist():
            self.assertEqual(item.comment, b'')
            self.assertEqual(item.date_time, (1980, 1, 1, 0, 0, 0))
            self.assertEqual(item.create_system, 3)  # 3 is UNIX
    def test_office(self):
        shutil.copy('./tests/data/dirty.docx', './tests/data/clean.docx')
        p = office.MSOfficeParser('./tests/data/clean.docx')
        meta = p.get_meta()
        self.assertIsNotNone(meta)
        ret = p.remove_all()
        self.assertTrue(ret)
        p = office.MSOfficeParser('./tests/data/clean.cleaned.docx')
        self.assertEqual(p.get_meta(), {})
        self.__check_zip_meta(p)
        self.__check_deep_meta(p)
        os.remove('./tests/data/clean.docx')
        os.remove('./tests/data/clean.cleaned.docx')
    def test_libreoffice(self):
        shutil.copy('./tests/data/dirty.odt', './tests/data/clean.odt')
        p = office.LibreOfficeParser('./tests/data/clean.odt')
        meta = p.get_meta()
        self.assertIsNotNone(meta)
        ret = p.remove_all()
        self.assertTrue(ret)
        p = office.LibreOfficeParser('./tests/data/clean.cleaned.odt')
        self.assertEqual(p.get_meta(), {})
        self.__check_zip_meta(p)
        self.__check_deep_meta(p)
        os.remove('./tests/data/clean.odt')
        os.remove('./tests/data/clean.cleaned.odt')
class TestLightWeightCleaning(unittest.TestCase):
    def test_pdf(self):
        shutil.copy('./tests/data/dirty.pdf', './tests/data/clean.pdf')
        p = pdf.PDFParser('./tests/data/clean.pdf')
        meta = p.get_meta()
        self.assertEqual(meta['producer'], 'pdfTeX-1.40.14')
        ret = p.remove_all_lightweight()
        self.assertTrue(ret)
        p = pdf.PDFParser('./tests/data/clean.cleaned.pdf')
        expected_meta = {'creation-date': -1, 'format': 'PDF-1.5', 'mod-date': -1}
        self.assertEqual(p.get_meta(), expected_meta)
        os.remove('./tests/data/clean.pdf')
        os.remove('./tests/data/clean.cleaned.pdf')
    def test_png(self):
        shutil.copy('./tests/data/dirty.png', './tests/data/clean.png')
        p = images.PNGParser('./tests/data/clean.png')
        meta = p.get_meta()
        self.assertEqual(meta['Comment'], 'This is a comment, be careful!')
        ret = p.remove_all_lightweight()
        self.assertTrue(ret)
        p = images.PNGParser('./tests/data/clean.cleaned.png')
        self.assertEqual(p.get_meta(), {})
        os.remove('./tests/data/clean.png')
        os.remove('./tests/data/clean.cleaned.png')
class TestCleaning(unittest.TestCase):
    def test_pdf(self):
        shutil.copy('./tests/data/dirty.pdf', './tests/data/clean.pdf')
							
							
							
								
							
						
@@ -282,9 +249,11 @@ class TestCleaning(unittest.TestCase):
        p = pdf.PDFParser('./tests/data/clean.cleaned.pdf')
        expected_meta = {'creation-date': -1, 'format': 'PDF-1.5', 'mod-date': -1}
        self.assertEqual(p.get_meta(), expected_meta)
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.pdf')
        os.remove('./tests/data/clean.cleaned.pdf')
        os.remove('./tests/data/clean.cleaned.cleaned.pdf')
    def test_png(self):
        shutil.copy('./tests/data/dirty.png', './tests/data/clean.png')
							
							
							
								
							
						
@@ -298,9 +267,11 @@ class TestCleaning(unittest.TestCase):
        p = images.PNGParser('./tests/data/clean.cleaned.png')
        self.assertEqual(p.get_meta(), {})
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.png')
        os.remove('./tests/data/clean.cleaned.png')
        os.remove('./tests/data/clean.cleaned.cleaned.png')
    def test_jpg(self):
        shutil.copy('./tests/data/dirty.jpg', './tests/data/clean.jpg')
							
							
							
								
							
						
@@ -314,9 +285,11 @@ class TestCleaning(unittest.TestCase):
        p = images.JPGParser('./tests/data/clean.cleaned.jpg')
        self.assertEqual(p.get_meta(), {})
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.jpg')
        os.remove('./tests/data/clean.cleaned.jpg')
        os.remove('./tests/data/clean.cleaned.cleaned.jpg')
    def test_mp3(self):
        shutil.copy('./tests/data/dirty.mp3', './tests/data/clean.mp3')
							
							
							
								
							
						
@@ -330,9 +303,11 @@ class TestCleaning(unittest.TestCase):
        p = audio.MP3Parser('./tests/data/clean.cleaned.mp3')
        self.assertEqual(p.get_meta(), {})
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.mp3')
        os.remove('./tests/data/clean.cleaned.mp3')
        os.remove('./tests/data/clean.cleaned.cleaned.mp3')
    def test_ogg(self):
        shutil.copy('./tests/data/dirty.ogg', './tests/data/clean.ogg')
							
							
							
								
							
						
@@ -346,9 +321,11 @@ class TestCleaning(unittest.TestCase):
        p = audio.OGGParser('./tests/data/clean.cleaned.ogg')
        self.assertEqual(p.get_meta(), {})
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.ogg')
        os.remove('./tests/data/clean.cleaned.ogg')
        os.remove('./tests/data/clean.cleaned.cleaned.ogg')
    def test_flac(self):
        shutil.copy('./tests/data/dirty.flac', './tests/data/clean.flac')
							
							
							
								
							
						
@@ -362,9 +339,11 @@ class TestCleaning(unittest.TestCase):
        p = audio.FLACParser('./tests/data/clean.cleaned.flac')
        self.assertEqual(p.get_meta(), {})
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.flac')
        os.remove('./tests/data/clean.cleaned.flac')
        os.remove('./tests/data/clean.cleaned.cleaned.flac')
    def test_office(self):
        shutil.copy('./tests/data/dirty.docx', './tests/data/clean.docx')
							
							
							
								
							
						
@@ -378,10 +357,11 @@ class TestCleaning(unittest.TestCase):
        p = office.MSOfficeParser('./tests/data/clean.cleaned.docx')
        self.assertEqual(p.get_meta(), {})
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.docx')
        os.remove('./tests/data/clean.cleaned.docx')
        os.remove('./tests/data/clean.cleaned.cleaned.docx')
    def test_libreoffice(self):
        shutil.copy('./tests/data/dirty.odt', './tests/data/clean.odt')
							
							
							
								
							
						
@@ -395,9 +375,11 @@ class TestCleaning(unittest.TestCase):
        p = office.LibreOfficeParser('./tests/data/clean.cleaned.odt')
        self.assertEqual(p.get_meta(), {})
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.odt')
        os.remove('./tests/data/clean.cleaned.odt')
        os.remove('./tests/data/clean.cleaned.cleaned.odt')
    def test_tiff(self):
        shutil.copy('./tests/data/dirty.tiff', './tests/data/clean.tiff')
							
							
							
								
							
						
@@ -411,9 +393,11 @@ class TestCleaning(unittest.TestCase):
        p = images.TiffParser('./tests/data/clean.cleaned.tiff')
        self.assertEqual(p.get_meta(), {})
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.tiff')
        os.remove('./tests/data/clean.cleaned.tiff')
        os.remove('./tests/data/clean.cleaned.cleaned.tiff')
    def test_bmp(self):
        shutil.copy('./tests/data/dirty.bmp', './tests/data/clean.bmp')
							
							
							
								
							
						
@@ -427,9 +411,11 @@ class TestCleaning(unittest.TestCase):
        p = harmless.HarmlessParser('./tests/data/clean.cleaned.bmp')
        self.assertEqual(p.get_meta(), {})
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.bmp')
        os.remove('./tests/data/clean.cleaned.bmp')
        os.remove('./tests/data/clean.cleaned.cleaned.bmp')
    def test_torrent(self):
        shutil.copy('./tests/data/dirty.torrent', './tests/data/clean.torrent')
							
							
							
								
							
						
@@ -443,42 +429,47 @@ class TestCleaning(unittest.TestCase):
        p = torrent.TorrentParser('./tests/data/clean.cleaned.torrent')
        self.assertEqual(p.get_meta(), {})
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.torrent')
        os.remove('./tests/data/clean.cleaned.torrent')
        os.remove('./tests/data/clean.cleaned.cleaned.torrent')
    def test_odf(self):
        shutil.copy('./tests/data/dirty.odf', './tests/data/clean.odf')
        p = office.LibreOfficeParser('./tests/data/clean.odf')
        meta = p.get_meta()
        self.assertEqual(meta['meta:creation-date'], '2018-04-23T00:18:59.438231281')
        self.assertEqual(meta['meta.xml']['meta:creation-date'], '2018-04-23T00:18:59.438231281')
        ret = p.remove_all()
        self.assertTrue(ret)
        p = office.LibreOfficeParser('./tests/data/clean.cleaned.odf')
        self.assertEqual(p.get_meta(), {})
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.odf')
        os.remove('./tests/data/clean.cleaned.odf')
        os.remove('./tests/data/clean.cleaned.cleaned.odf')
    def test_odg(self):
        shutil.copy('./tests/data/dirty.odg', './tests/data/clean.odg')
        p = office.LibreOfficeParser('./tests/data/clean.odg')
        meta = p.get_meta()
        self.assertEqual(meta['dc:date'], '2018-04-23T00:26:59.385838550')
        self.assertEqual(meta['meta.xml']['dc:date'], '2018-04-23T00:26:59.385838550')
        ret = p.remove_all()
        self.assertTrue(ret)
        p = office.LibreOfficeParser('./tests/data/clean.cleaned.odg')
        self.assertEqual(p.get_meta(), {})
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.odg')
        os.remove('./tests/data/clean.cleaned.odg')
        os.remove('./tests/data/clean.cleaned.cleaned.odg')
    def test_txt(self):
        shutil.copy('./tests/data/dirty.txt', './tests/data/clean.txt')
							
							
							
								
							
						
@@ -492,6 +483,134 @@ class TestCleaning(unittest.TestCase):
        p = harmless.HarmlessParser('./tests/data/clean.cleaned.txt')
        self.assertEqual(p.get_meta(), {})
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.txt')
        os.remove('./tests/data/clean.cleaned.txt')
        os.remove('./tests/data/clean.cleaned.cleaned.txt')
    def test_avi(self):
        try:
            video._get_ffmpeg_path()
        except RuntimeError:
            raise unittest.SkipTest
        shutil.copy('./tests/data/dirty.avi', './tests/data/clean.avi')
        p = video.AVIParser('./tests/data/clean.avi')
        meta = p.get_meta()
        self.assertEqual(meta['Software'], 'MEncoder SVN-r33148-4.0.1')
        ret = p.remove_all()
        self.assertTrue(ret)
        p = video.AVIParser('./tests/data/clean.cleaned.avi')
        self.assertEqual(p.get_meta(), {})
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.avi')
        os.remove('./tests/data/clean.cleaned.avi')
        os.remove('./tests/data/clean.cleaned.cleaned.avi')
    def test_zip(self):
        with zipfile.ZipFile('./tests/data/dirty.zip', 'w') as zout:
            zout.write('./tests/data/dirty.flac')
            zout.write('./tests/data/dirty.docx')
            zout.write('./tests/data/dirty.jpg')
        p = archive.ZipParser('./tests/data/dirty.zip')
        meta = p.get_meta()
        self.assertEqual(meta['tests/data/dirty.docx']['word/media/image1.png']['Comment'], 'This is a comment, be careful!')
        ret = p.remove_all()
        self.assertTrue(ret)
        p = archive.ZipParser('./tests/data/dirty.cleaned.zip')
        self.assertEqual(p.get_meta(), {})
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/dirty.zip')
        os.remove('./tests/data/dirty.cleaned.zip')
        os.remove('./tests/data/dirty.cleaned.cleaned.zip')
    def test_mp4(self):
        try:
            video._get_ffmpeg_path()
        except RuntimeError:
            raise unittest.SkipTest
        shutil.copy('./tests/data/dirty.mp4', './tests/data/clean.mp4')
        p = video.MP4Parser('./tests/data/clean.mp4')
        meta = p.get_meta()
        self.assertEqual(meta['Encoder'], 'HandBrake 0.9.4 2009112300')
        ret = p.remove_all()
        self.assertTrue(ret)
        p = video.MP4Parser('./tests/data/clean.cleaned.mp4')
        self.assertNotIn('Encoder', p.get_meta())
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.mp4')
        os.remove('./tests/data/clean.cleaned.mp4')
        os.remove('./tests/data/clean.cleaned.cleaned.mp4')
    def test_wmv(self):
        try:
            video._get_ffmpeg_path()
        except RuntimeError:
            raise unittest.SkipTest
        shutil.copy('./tests/data/dirty.wmv', './tests/data/clean.wmv')
        p = video.WMVParser('./tests/data/clean.wmv')
        meta = p.get_meta()
        self.assertEqual(meta['EncodingSettings'], 'Lavf52.103.0')
        ret = p.remove_all()
        self.assertTrue(ret)
        p = video.WMVParser('./tests/data/clean.cleaned.wmv')
        self.assertNotIn('EncodingSettings', p.get_meta())
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.wmv')
        os.remove('./tests/data/clean.cleaned.wmv')
        os.remove('./tests/data/clean.cleaned.cleaned.wmv')
    def test_gif(self):
        shutil.copy('./tests/data/dirty.gif', './tests/data/clean.gif')
        p = images.GIFParser('./tests/data/clean.gif')
        meta = p.get_meta()
        self.assertEqual(meta['Comment'], 'this is a test comment')
        ret = p.remove_all()
        self.assertTrue(ret)
        p = images.GIFParser('./tests/data/clean.cleaned.gif')
        self.assertNotIn('EncodingSettings', p.get_meta())
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.gif')
        os.remove('./tests/data/clean.cleaned.gif')
        os.remove('./tests/data/clean.cleaned.cleaned.gif')
    def test_html(self):
        shutil.copy('./tests/data/dirty.html', './tests/data/clean.html')
        p = html.HTMLParser('./tests/data/clean.html')
        meta = p.get_meta()
        self.assertEqual(meta['author'], 'jvoisin')
        ret = p.remove_all()
        self.assertTrue(ret)
        p = html.HTMLParser('./tests/data/clean.cleaned.html')
        self.assertEqual(p.get_meta(), {})
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.html')
        os.remove('./tests/data/clean.cleaned.html')
        os.remove('./tests/data/clean.cleaned.cleaned.html')
							
							
							
						
 
							
							
							
						
@@ -0,0 +1,106 @@
#!/usr/bin/env python3
import unittest
import shutil
import os
from libmat2 import pdf, images, torrent
class TestLightWeightCleaning(unittest.TestCase):
    def test_pdf(self):
        shutil.copy('./tests/data/dirty.pdf', './tests/data/clean.pdf')
        p = pdf.PDFParser('./tests/data/clean.pdf')
        meta = p.get_meta()
        self.assertEqual(meta['producer'], 'pdfTeX-1.40.14')
        p.lightweight_cleaning = True
        ret = p.remove_all()
        self.assertTrue(ret)
        p = pdf.PDFParser('./tests/data/clean.cleaned.pdf')
        expected_meta = {'creation-date': -1, 'format': 'PDF-1.5', 'mod-date': -1}
        self.assertEqual(p.get_meta(), expected_meta)
        os.remove('./tests/data/clean.pdf')
        os.remove('./tests/data/clean.cleaned.pdf')
    def test_png(self):
        shutil.copy('./tests/data/dirty.png', './tests/data/clean.png')
        p = images.PNGParser('./tests/data/clean.png')
        meta = p.get_meta()
        self.assertEqual(meta['Comment'], 'This is a comment, be careful!')
        p.lightweight_cleaning = True
        ret = p.remove_all()
        self.assertTrue(ret)
        p = images.PNGParser('./tests/data/clean.cleaned.png')
        self.assertEqual(p.get_meta(), {})
        p = images.PNGParser('./tests/data/clean.png')
        p.lightweight_cleaning = True
        ret = p.remove_all()
        self.assertTrue(ret)
        os.remove('./tests/data/clean.png')
        os.remove('./tests/data/clean.cleaned.png')
    def test_jpg(self):
        shutil.copy('./tests/data/dirty.jpg', './tests/data/clean.jpg')
        p = images.JPGParser('./tests/data/clean.jpg')
        meta = p.get_meta()
        self.assertEqual(meta['Comment'], 'Created with GIMP')
        p.lightweight_cleaning = True
        ret = p.remove_all()
        self.assertTrue(ret)
        p = images.JPGParser('./tests/data/clean.cleaned.jpg')
        self.assertEqual(p.get_meta(), {})
        os.remove('./tests/data/clean.jpg')
        os.remove('./tests/data/clean.cleaned.jpg')
    def test_torrent(self):
        shutil.copy('./tests/data/dirty.torrent', './tests/data/clean.torrent')
        p = torrent.TorrentParser('./tests/data/clean.torrent')
        meta = p.get_meta()
        self.assertEqual(meta['created by'], b'mktorrent 1.0')
        p.lightweight_cleaning = True
        ret = p.remove_all()
        self.assertTrue(ret)
        p = torrent.TorrentParser('./tests/data/clean.cleaned.torrent')
        self.assertEqual(p.get_meta(), {})
        os.remove('./tests/data/clean.torrent')
        os.remove('./tests/data/clean.cleaned.torrent')
    def test_tiff(self):
        shutil.copy('./tests/data/dirty.tiff', './tests/data/clean.tiff')
        p = images.TiffParser('./tests/data/clean.tiff')
        meta = p.get_meta()
        self.assertEqual(meta['ImageDescription'], 'OLYMPUS DIGITAL CAMERA         ')
        p.lightweight_cleaning = True
        ret = p.remove_all()
        self.assertTrue(ret)
        p = images.TiffParser('./tests/data/clean.cleaned.tiff')
        self.assertEqual(p.get_meta(),
                {
                    'Orientation': 'Horizontal (normal)',
                    'ResolutionUnit': 'inches',
                    'XResolution': 72,
                    'YResolution': 72
                    }
                )
        os.remove('./tests/data/clean.tiff')
        os.remove('./tests/data/clean.cleaned.tiff')
							
							
							
						
@@ -0,0 +1,31 @@
#!/usr/bin/env python3
import unittest
import shutil
import os
from libmat2 import office, UnknownMemberPolicy
class TestPolicy(unittest.TestCase):
    def test_policy_omit(self):
        shutil.copy('./tests/data/embedded.docx', './tests/data/clean.docx')
        p = office.MSOfficeParser('./tests/data/clean.docx')
        p.unknown_member_policy = UnknownMemberPolicy.OMIT
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.docx')
        os.remove('./tests/data/clean.cleaned.docx')
    def test_policy_keep(self):
        shutil.copy('./tests/data/embedded.docx', './tests/data/clean.docx')
        p = office.MSOfficeParser('./tests/data/clean.docx')
        p.unknown_member_policy = UnknownMemberPolicy.KEEP
        self.assertTrue(p.remove_all())
        os.remove('./tests/data/clean.docx')
        os.remove('./tests/data/clean.cleaned.docx')
    def test_policy_unknown(self):
        shutil.copy('./tests/data/embedded.docx', './tests/data/clean.docx')
        p = office.MSOfficeParser('./tests/data/clean.docx')
        with self.assertRaises(ValueError):
            p.unknown_member_policy = UnknownMemberPolicy('unknown_policy_name_totally_invalid')
        os.remove('./tests/data/clean.docx')
Author	SHA1	Message	Date
jvoisin	6b45064c78	Bump the changelog	2019-02-17 17:02:17 +01:00
jvoisin	a81b7658a8	Make the mandatory metadata warning generic This should close #95.	2019-02-10 21:46:13 +01:00
jvoisin	6e63e03b86	Streamline a bit the previous commit	2019-02-09 15:23:16 +01:00
Poncho	a71488d459	bind mount /etc/ld.so.cache to the sandbox without /etc/ld.so.cache available in the sandbox, tests fail on gentoo with: /usr/bin/ffmpeg: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory	2019-02-09 09:49:51 +01:00
jvoisin	6ef6aaa222	Improve a bit get_meta for libreoffice files	2019-02-08 23:23:56 +01:00
jvoisin	6cc034e81b	Add support for html files	2019-02-08 23:05:18 +01:00
jvoisin	e1dd439fc8	Use of the archive refactoring for the office documents too	2019-02-07 22:19:37 +01:00
jvoisin	b9a62d798a	Refactor a bit office get_meta handling This should make easier to get more metadata from archive-based file formats.	2019-02-04 00:31:26 +01:00
jvoisin	54e50450ad	Fix the return code on parsers' list display	2019-02-03 21:09:12 +01:00
jvoisin	433609f8ea	Implement .gif support	2019-02-03 21:01:58 +01:00
intrigeri	e8c1bb0e3c	Whenever possible, use bwrap for subprocesses This should closes #90	2019-02-03 19:18:41 +01:00
jvoisin	8b5d0c286c	Document how to get the coverage from the testsuite	2019-02-03 18:33:25 +01:00
jvoisin	8e84ba547a	Add support for wmv	2019-02-02 19:19:36 +01:00
jvoisin	812bf2553b	Rename the internal class used by the nautilus extension This should solve collisions with people like me that are copy/pasting the documentation, creating conflicts with other extensions that are doing the very same thing.	2019-01-16 23:10:17 +01:00
Alan	94cdca1ed2	Update debian packaging status	2018-12-15 17:05:37 +01:00
Alan	b755aba8ea	Fix debian build instructions	2018-12-15 17:05:32 +01:00
jvoisin	edce78859b	Add a note in the readme about `-L` and pdf	2018-12-08 18:39:56 +01:00
jvoisin	0ab17b973b	mat2 is now available on pypi	2018-11-11 20:49:24 +01:00
jvoisin	389311475c	Add a readme for the nautilus extension	2018-11-11 19:58:51 +01:00
jvoisin	505be24be9	Bump the changelog	2018-11-10 12:46:31 +01:00
jvoisin	ef8265e86a	Remove a useless image	2018-11-10 10:54:13 +01:00
jvoisin	1d75451b77	Add some type annotations to the nautilus extension	2018-11-08 21:40:33 +01:00
jvoisin	dc35ef56c8	Add a missing file :/	2018-11-07 22:20:31 +01:00
jvoisin	3aa76cc58e	Prove that the previous commit is working	2018-11-07 22:13:36 +01:00
jvoisin	8ff57c5803	Do not display control characters in output Kudos to Sherry Taylor for reporting this issue ♥	2018-11-07 22:07:46 +01:00
jvoisin	04bb8c8ccf	Add mp4 support	2018-10-28 07:41:04 -07:00
jvoisin	3a070b0ab7	Add support for zip files	2018-10-25 11:56:46 +02:00
jvoisin	283e5e5787	Improve archive-based parser's robustness against corrupted embedded files	2018-10-25 11:56:12 +02:00
jvoisin	513d897ea0	Implement get_meta() for archives	2018-10-25 11:29:50 +02:00
jvoisin	5a9dc388ad	Minor refactorisation of how we're checking for exiftool's presence	2018-10-25 11:05:06 +02:00
jvoisin	5a08f5b7bf	Add a test for tiff lightweight cleaning	2018-10-24 20:19:36 +02:00
jvoisin	fe885babee	Implement lightweight cleaning for jpg	2018-10-24 19:35:07 +02:00
jvoisin	1040a594d6	Fix a stupid typo in the changelog	2018-10-23 17:13:53 +02:00
jvoisin	e510a225e3	Bump the changelog	2018-10-23 17:07:42 +02:00
jvoisin	a98962a0fa	Document that FFmpeg is now an optional dependency	2018-10-23 16:57:18 +02:00
jvoisin	9a81b3adfd	Improve type annotation coverage	2018-10-23 16:32:28 +02:00
jvoisin	f1a071d460	Implement lightweight cleaning for png and tiff	2018-10-23 16:22:11 +02:00
jvoisin	38df679a88	Optimize the handling of problematic files	2018-10-23 13:49:58 +02:00
jvoisin	44f267a596	Improve problematic filenames support	2018-10-22 16:56:05 +02:00
jvoisin	5bc88faedf	Fix the testsuite on fedora	2018-10-22 13:55:09 +02:00
jvoisin	83389a63e9	Test mat2's reliability wrt. corrupted video files	2018-10-22 13:42:04 +02:00
jvoisin	e70ea811c9	Implement support for .avi files, via ffmpeg - This commit introduces optional dependencies (namely ffmpeg): mat2 will spit a warning when trying to process an .avi file if ffmpeg isn't installed. - Since metadata are obtained via exiftool, this commit also refactors a bit our exfitool wrapper.	2018-10-22 12:58:01 +02:00
jvoisin	2ae5d909c3	Make pyflakes happy	2018-10-18 21:22:28 +02:00
jvoisin	5896387ade	Output metadata in a sorted fashion	2018-10-18 21:17:12 +02:00
jvoisin	d4c050a738	wtf python	2018-10-18 20:29:50 +02:00
jvoisin	f04d4b28fc	Fix the tests on Debian?	2018-10-18 20:23:00 +02:00
jvoisin	da88d30689	Fix the CI on debian	2018-10-14 10:59:50 +02:00
Rémi Oudin	f1552b2ccb	Make testsuite fail if coverage is under 100% Fixes issue #61	2018-10-12 17:07:56 +02:00
jvoisin	2ba38dd2a1	Bump mypy typing coverage	2018-10-12 14:32:09 +02:00
jvoisin	b832a59414	Refactor lightweight mode implementation	2018-10-12 11:49:24 +02:00
Sébastien Helleu	6ce88b8b7f	Fix typo in README	2018-10-11 21:40:58 +02:00
jvoisin	2444caccc0	Make pylint happier	2018-10-11 19:55:07 +02:00
jvoisin	b9dbd12ef9	Implement recursive metadata for FLAC files Since FLAC files can contain covers, it makes sense to parse their metadata	2018-10-11 19:52:47 +02:00
jvoisin	b2e153b69c	Delete pictures of FLAC files	2018-10-11 18:15:11 +02:00
Simon Magnin	35dca4bf1c	add recursivity for archive style files	2018-10-11 08:28:02 -07:00
jvoisin	4ed30b5e00	Add the mailing list announcement to the release process	2018-10-06 20:00:50 +02:00
jvoisin	0d25b18d26	Improve both the typing and the comments	2018-10-05 17:07:58 +02:00
jvoisin	d0f3534eff	Hide unsupported extensions in `mat2 -l`	2018-10-05 12:43:21 +02:00
jvoisin	8675706c93	Improve the display of mat2 when no metadata are found This should close #74	2018-10-05 12:35:35 +02:00
Poncho	5e196ecef8	Update logo Use color palette an size according to https://developer.gnome.org/hig/stable/icon-design.html.en	2018-10-05 11:13:31 +02:00
jvoisin	8e98593b02	Trash word/people.xml in office files	2018-10-04 16:28:20 +02:00
jvoisin	df252fd71a	Remove a superfluous import	2018-10-04 16:19:38 +02:00
jvoisin	a1c39104fc	Make the testsuite runnable on the installed MAT2	2018-10-04 16:16:52 +02:00
georg	34fbd633fd	libmat2: fix shebang Relates `0a2a398c9c`	2018-10-03 18:38:28 +00:00
jvoisin	f1ceed13b5	Bump the changelog	2018-10-03 16:38:05 +02:00
jvoisin	5a5c642a46	Don't break office files for MS Office We didn't take the whitelist into account while removing dangling files from [Content_types].xml	2018-10-03 16:38:05 +02:00
jvoisin	84e302ac93	Remove file left behind by the testsuite	2018-10-03 16:38:05 +02:00
jvoisin	7901fdef2e	Fix the testsuite	2018-10-03 15:29:46 +02:00
jvoisin	1b356b8c6f	Improve mat2's cli reliability - Replace some class members by instance members - Don't thread the cleaning process anymore for now	2018-10-03 15:22:36 +02:00
jvoisin	c67bbafb2c	Use [Content_Types].xml to improve MS Office coverage	2018-10-02 11:55:42 -07:00
georg	5b606f939d	fix typo	2018-10-02 16:01:24 +00:00
jvoisin	156e81fb4c	Check that cleaning twice doesn't break the file	2018-10-02 16:05:51 +02:00
jvoisin	9578e4b4ee	Silence a bit the testsuite	2018-10-02 15:26:13 +02:00
jvoisin	a46a7eb6fa	Update the CONTRIBUTING.md file wrt. to the previous commit	2018-10-02 11:12:50 +02:00
georg	a24c59b208	manpage: this is about mat2, not mat	2018-10-01 21:26:59 +00:00
jvoisin	652b8e519f	Files processed via MAT2 are now accepted without warnings by MS Office	2018-10-01 12:25:37 -07:00
jvoisin	c14be47f95	Fix a typo in the README spotted by @georg	2018-10-01 15:51:22 +02:00
jvoisin	81a3881aa4	Please mypy	2018-09-30 19:55:17 +02:00
jvoisin	e342671ead	Remove dangling references in MS Office's [Content_types].xml	2018-09-30 19:53:18 +02:00
jvoisin	212d9c472c	Document mat2's output scheme in the manpage as well	2018-09-26 00:13:44 +02:00
jvoisin	a88107c9ca	Document the output scheme in the README	2018-09-26 00:11:16 +02:00
jvoisin	7f629ed2e3	Run the testsuite exclusively on Whitewhale for now This should fix the intermittent failures, thanks to @pollo for the tip	2018-09-25 17:09:04 +02:00
jvoisin	719cdf20fa	Second pass of minor formatting	2018-09-24 20:15:07 +02:00
jvoisin	2e243355f5	Fix some minor formatting issues	2018-09-24 19:50:24 +02:00
jvoisin	174d4a0ac0	Implement rsid stripping for office files MS Office XML rsid is a "unique identifier used to track the editing session when the physical character representing this section mark was last formatted." See the following links for details: - https://msdn.microsoft.com/en-us/library/office/documentformat.openxml.wordprocessing.previoussectionproperties.rsidrpr.aspx - https://blogs.msdn.microsoft.com/brian_jones/2006/12/11/whats-up-with-all-those-rsids/.	2018-09-24 18:03:59 +02:00
jvoisin	fbcf68c280	Lexicographical sort on xml attributes for office files In XML, the order of the attributes shouldn't be meaningful, however, MS Office sorts attributes for a given XML tag differently than LibreOffice.	2018-09-24 17:45:09 +02:00
jvoisin	9826de3526	Add a test for zip ordering	2018-09-20 14:04:46 +02:00
jvoisin	ab71c29a28	Make pyflakes happy	2018-09-20 01:19:22 +02:00
jvoisin	3d2842802c	Split the tests	2018-09-20 01:13:59 +02:00
jvoisin	a1a06d023e	Insert archive members in lexicographic order	2018-09-18 22:44:21 +02:00
jvoisin	9275d64be5	Add a link to the gentoo overlay	2018-09-17 21:11:48 +02:00
Yoann Lamouroux	0a2a398c9c	trivial modification of all shebang. `/usr/bin/python3` -> `/usr/bin/env python3` It's always better to trust the environment defined path to bin/python, as virtualenv become the way to go.	2018-09-12 14:58:27 +02:00
jvoisin	5cf94bd256	Bump coverage back to 100%	2018-09-12 14:54:54 +02:00
jvoisin	de65f4f4d4	Improve the resilience of MAT2 wrt. corrupted PNG	2018-09-09 19:09:05 +02:00
jvoisin	759efa03ee	Fix a setuptool-related warning	2018-09-06 11:42:07 +02:00
jvoisin	9fe6f1023b	Make pylint happy	2018-09-06 11:36:04 +02:00
jvoisin	e3d817f57e	Split office and archives	2018-09-06 11:34:14 +02:00
jvoisin	2e9adab86a	Improve a cli test resilience	2018-09-06 11:32:29 +02:00
jvoisin	c8c27dcf38	Mention "scambled exif" as a related software	2018-09-06 11:20:08 +02:00
jvoisin	120b204988	Change a bit the previous commit	2018-09-06 11:13:11 +02:00
Daniel Kahn Gillmor	f3cef319b9	Unknown Members: make policy use an Enum Closes #60 Note: this changeset also ensures that clean.cleaned.docx is removed up after the pytest is over.	2018-09-05 18:59:33 -04:00
Daniel Kahn Gillmor	2d9ba81a84	spelling correction. while mat2 has both a thread model (a thread pool that strips metadata in parallel) and a threat model (a list of malicious adversaries and their capabilities that we are trying to defeat), i think this paragraph is talking about the latter.	2018-09-05 13:00:28 -04:00
jvoisin	072ee1814d	Remove defusedxml support and document why	2018-09-05 18:41:08 +02:00
jvoisin	3649c0ccaf	Remove short version of dangerous/advanced options	2018-09-05 17:48:14 +02:00
Christian	119085f28d	Add missing dependencies for the Nautilus extension to INSTALL.md	2018-09-05 17:42:39 +02:00
Christian	e515d907d7	Make sure target directory exists, assume MAT2 is in parent directory	2018-09-05 17:42:13 +02:00
jvoisin	46bb1b83ea	Improve the previous commit	2018-09-05 17:26:09 +02:00
Daniel Kahn Gillmor	1d7e374e5b	office: try all members, even when one fails the end result will be the same -- an abort -- but the user will get to see all the warnings for a particular file, instead of getting them one at a time.	2018-09-04 18:28:04 -04:00
Daniel Kahn Gillmor	915dc634c4	document all unknown/unhandlable files even on abort This makes it easy to get a list of all files that mat2 doesn't know how to handle, without having to choose -u keep or -u omit.	2018-09-04 18:28:04 -04:00
Daniel Kahn Gillmor	10d60bd398	add --unknown-members argument to mat2 This allows the user to make use of parser.unknown_member_policy for archive formats. At the suggestion of @jvoisin, it also prints a scary warning if the user explicitly chooses 'keep'.	2018-09-04 18:28:04 -04:00
Daniel Kahn Gillmor	4192a2daa3	office: create policy for what to do about unknown members previously, encountering an unknown member meant that any parser of this type would abort. now, the user can set parser.unknown_member_policy to either 'omit' or 'keep' if they don't want the current action of 'abort' note that this causes pylint to complain about branching depth for remove_all() because of the nuanced error-handling. I've disabled this check.	2018-09-04 16:13:33 -04:00
jvoisin	9ce458cb3b	Update the release process to create signed tarballs	2018-09-03 14:28:00 +02:00
jvoisin	907fc591cc	Bump the coverage back to 100%	2018-09-01 16:58:34 +02:00
jvoisin	8255293d1d	Add a link to the mailing list	2018-09-01 16:45:20 +02:00
jvoisin	6b7e8ad8c0	Add a .mailmap file	2018-09-01 16:12:03 +02:00
jvoisin	b7a8622682	Bump the changelog	2018-09-01 16:00:41 +02:00
Daniel Kahn Gillmor	3e2890eb9e	three minor spelling fixes	2018-09-01 06:47:22 -07:00
jvoisin	91e80527fc	Add archlinux to the CI	2018-09-01 15:41:22 +02:00
jvoisin	7877ba0da5	Fix a minor formatting issue	2018-09-01 14:16:55 +02:00
dkg	e2634f7a50	Logging cleanup	2018-09-01 05:14:32 -07:00
jvoisin	aba9b72d2c	Fix some leftovers from the previous commit	2018-08-26 01:10:48 +02:00
Antoine Tenart	15dd3d84ff	nautilus: rename the nautilus plugin Rename the Nautilus plugin (removing 'nautilus' from the file name) as it already lives in its own 'nautilus' directory. The same argument applies when installing the plugin in a distro. Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>	2018-08-26 01:09:41 +02:00
Antoine Tenart	588466f4a8	INSTALL: add instructions for the Fedora copr Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>	2018-08-24 18:47:39 +02:00
Antoine Tenart	cf89ff45c2	gitignore: exclude all hidden files from being committed Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>	2018-08-24 09:14:05 +02:00
Antoine Tenart	f583d12564	nautilus: remove swp file A .swp file was committed by mistake. Remove it. Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>	2018-08-24 09:09:49 +02:00
jvoisin	1c72448e58	Improve the detection of unsupported extensions in uppercase	2018-08-23 21:28:37 +02:00
Antoine Tenart	f068621628	libmat2: images: fix handling of .JPG files Pixbuf only supports .jpeg files, not .jpg, so libmat2 looks for such an extension and converts it if necessary. As this check is case sensitive, processing .JPG files does not work. Fixes #47. Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>	2018-08-23 20:43:27 +02:00
jvoisin	fe09d81ab1	Don't forget to tell the downstreams about new releases	2018-08-19 15:51:44 +02:00
jvoisin	5be66dbe91	Mention the Arch linux's AUR package of MAT2	2018-08-19 15:51:23 +02:00
jvoisin	ee496cfa7f	Fix a typo spotted by @Francois_B	2018-08-19 15:51:09 +02:00
jvoisin	6e2e411a2a	Add an INSTALL.md file	2018-08-08 20:45:09 +02:00
jvoisin	2ce1dc793e	Bump the changelog	2018-08-03 22:20:24 +02:00
jvoisin	e27768824a	Change mat2's logo	2018-08-03 21:45:41 +02:00
jvoisin	36c5bad140	Improve our .gitignore	2018-07-30 23:00:33 +02:00
jvoisin	b5a9520a60	Add a cli-related test	2018-07-30 22:54:41 +02:00
jvoisin	a1257c538b	Add some tests about pathological files	2018-07-30 22:36:36 +02:00
Antoine Tenart	6d8e999f12	Rename image to icon in the Nautilus extension Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>	2018-07-26 09:01:27 +02:00
Antoine Tenart	1bc4c7aac9	Switch columns in the Nautilus extension Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>	2018-07-26 09:01:01 +02:00
Antoine Tenart	03245a8731	Rename the Nautilus path column to file Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>	2018-07-26 08:57:33 +02:00
Antoine Tenart	27445e9134	Rename the Nautilus exit button to close Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>	2018-07-26 08:57:09 +02:00
jvoisin	b32ba9f736	Improve a bit nautilus' popup	2018-07-25 22:48:05 +02:00
jvoisin	e9f28edf73	Add a man page and document how to keep it up to date	2018-07-24 22:34:33 +02:00
jvoisin	7697f9c085	Improve the linters' coverage	2018-07-23 23:55:45 +02:00
jvoisin	e920083559	The Nautilus extension is now working	2018-07-23 23:39:06 +02:00
georg	71b1ced842	AbstractParser: Fix typos	2018-07-21 00:46:48 +00:00
jvoisin	942859601d	Improve the code's documentation	2018-07-19 23:10:27 +02:00
jvoisin	565cb66d14	Minor simplification in how we're handling xml for office files	2018-07-19 22:55:08 +02:00
jvoisin	052a356750	Implement a much better Nautilus extension thanks to @atenart Co-authored-by: Antoine Tenart <antoine.tenart@ack.tf> Co-authored-by: jvoisin <julien.voisin@dustri.org>	2018-07-19 00:11:30 +02:00
jvoisin	2f670651cf	Minor cleanup of the Nautilus extension's code	2018-07-18 23:20:51 +02:00
jvoisin	0cd510938a	Minor code simplification	2018-07-18 23:15:47 +02:00
jvoisin	dc026f99ad	Show if files are supported or not in the Nautilus extension	2018-07-18 23:12:55 +02:00
jvoisin	0aac0d644d	Show a pretty icon for files in the Nautilus extension	2018-07-18 22:53:56 +02:00
jvoisin	17e69b6005	Change a button in the nautilus extension	2018-07-18 22:39:18 +02:00
jvoisin	cf5f3b268d	Add a separator for the Nautilus extension	2018-07-18 22:39:10 +02:00
jvoisin	a5eede9a21	Remove the disclaimer from the Nautilus extension	2018-07-18 22:38:42 +02:00
Antoine Tenart	926e8dac5f	nautilus: first working version Improve the nautilus extension to get to a first working version: - Single and multiple selections are working. - The menu item only is there if mat2 has a chance to work on the selected files. - Errors are reported using notifications. Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>	2018-07-18 22:38:05 +02:00
georg	edc5f86552	README: Fix typo	2018-07-16 15:09:22 +00:00
jvoisin	84d50f97c0	Add a check for a missed dependency in `./mat2 -c`	2018-07-15 17:00:01 +02:00
jvoisin	8093dce88e	Bump the changelog	2018-07-10 21:41:24 +02:00
jvoisin	5a7c7f35f7	Remove `print` from libmat, and use the `logging` module instead This should close #28	2018-07-10 21:30:38 +02:00
jvoisin	d5861e4653	Implement a check for dependencies in mat2 Example use: ``` $ mat2 -c Dependencies required for MAT2 0.1.3: - Cairo: yes - Exiftool: yes - GdkPixbuf from PyGobject: yes - Mutagen: yes - Poppler from PyGobject: yes - PyGobject: yes ``` This should close #35	2018-07-10 21:24:26 +02:00
jvoisin	22e3918f67	Add pylint3 to the ci	2018-07-09 01:22:08 +02:00
jvoisin	080d6769ca	Make pylint even happier	2018-07-09 01:11:44 +02:00
jvoisin	86fe3aa584	Fix the previous commit	2018-07-09 00:30:16 +02:00
jvoisin	cc327b1592	Minor improvement of fedora's duration in the testsuite	2018-07-09 00:27:40 +02:00
jvoisin	b4edd6d2a2	Document that MAT2 not being able to detect metadata doesn't mean that the file is clean	2018-07-09 00:17:59 +02:00
jvoisin	bd357b85f8	Remove a useless option that was never implemented anyway	2018-07-09 00:13:16 +02:00
jvoisin	8c21006e6c	Fix some pep8 issues spotted by pyflakes	2018-07-08 22:40:36 +02:00
jvoisin	f49aa5cab7	Achieve 100% coverage!	2018-07-08 22:27:37 +02:00
jvoisin	52a2c800b7	Bump coverage again	2018-07-08 21:50:52 +02:00
jvoisin	ad3e7ccee8	Bump coverage for office files and fix some related crashes	2018-07-08 21:35:45 +02:00
jvoisin	ca01484126	Silence a mypy's stupid warning	2018-07-08 17:12:17 +02:00
jvoisin	f9bc022c96	Add defusedxml as an (optional) way to prevent XML-based attacks Those attacks are DoS-only.	2018-07-08 17:07:26 +02:00
jvoisin	72e1fda18d	Remove a leftover print	2018-07-08 15:19:18 +02:00
jvoisin	3cd4f9111f	Bump coverage for torrent handling	2018-07-08 15:13:03 +02:00
jvoisin	b5fcddd6a6	Simplify how torrent files are handled - Rework the testsuite wrt. torrent - fail at parser's instantiation on corrupted torrent, instead of during `get_meta` or `remove_all` call	2018-07-08 13:49:11 +02:00
jvoisin	7ea362d908	Bump the coverage for pdf	2018-07-07 18:12:33 +02:00
jvoisin	85455a4419	Fix a mistake in office file revisions handling	2018-07-07 18:05:54 +02:00
jvoisin	9f631a1bb1	Bump a bit the coverage	2018-07-07 18:02:53 +02:00