Available detectors
The table detection filter can use different techniques to find tables in a document. This filter is required to compare the content of the table in the context of their respective rows, columns, and cells, which reduces false positive differences. The table detection is supposed to be used for documents that lack the document structure markup.
The available detection methods differ in the types of tables that can be detected as well as performance and accuracy.
Detect tables by border
This detector finds tables by their visual border or cell backgrounds. The detection result is very accurate, requires minimum performance, and needs no external tools, which is why this detector is always active for the table filter.
AI model for tables
The AI model for table detection uses a local AI model to find tables that have no visual borders and can only be identified by the structure of the textual content. This approach enables a detection of tables that cannot be identified by the default detector. But using the AI model requires much more performance than the default detector, which is why it should only be made available to all users if required.
The AI model relies on external tools that have to be installed and configured in order for this detector to be operable.
The installation of the required tools, dependencies, and the AI models requires up to 1GB of disk space. The list of required Python dependencies can be found here.
Note: that using this detector may automatically download and install the required dependencies and models.
Python installation
The AI model requires the Python runtime to execute. This runtime must be installed and available to the user running i-net PDFC.
To install Python, proceed as follows
Windows
-
Download Python
-
Visit https://www.python.org/downloads/windows/ and download the latest Python 3 installer.
-
-
Run the Installer
-
Run the installer as an administrator (right-click –> "Run as administrator").
-
Check "Install for all users" to ensure system-wide access.
-
Enable "Add Python to PATH" for command-line accessibility.
-
-
Verify Installation
-
Open Command Prompt and run
python –versionto confirm Python is accessible. -
Ensure the process user has read/execute permissions for the Python installation directory.
-
macOS
-
Download Python
-
Visit https://www.python.org/downloads/macos/ and download the macOS installer for the latest version.
-
-
Run the Installer
-
Open the
.pkgfile and follow the prompts. -
The installer automatically places Python in
/Applications/Python X.Xand adds it to/usr/local/bin, accessible to all users.
-
-
Verify Installation
-
Open Terminal and run
python3 –version. -
Ensure the process user has execute permissions for
/usr/local/bin/python3(default permissions typically suffice).
-
Linux (Ubuntu/Debian-based example)
-
Install Python via Package Manager
-
Open a terminal and update the package list:
sudo apt update. -
Install Python:
sudo apt install python3 python3-pip -y. -
This installs Python system-wide, typically in
/usr/bin/python3.
-
-
Verify Installation
-
Run
python3 –versionto confirm. -
Ensure the process user has execute permissions (default for
/usr/bin/python3).
-
-
Alternative (Manual Installation)
-
Download the source from python.org, compile, and install to
/usr/local/binusing:
-
tar -xzf Python-X.X.X.tar.gz cd Python-X.X.X ./configure --prefix=/usr/local make sudo make install
-
Verify with
/usr/local/bin/python3 –version.
Model installation
The AI model and the required dependencies will automatically be downloaded upon the first usage of the plugin. This process will download up to 1GB of additional dependencies and the AI model.
Model verification
The configuration option 'AI model for tables' will be enabled once i-net PDFC is able to verify that Python is installed and all required dependencies as well at the AI model are available.
In case any precondition is not met, the an error will be displayed. Common issues are:
-
Python is installed but not available the user running i-net PDFC.
-
A dependency is missing. Establish a connection to the internet so the dependency can be downloaded
