Apryse is an industry-leading provider of document software development technology, committed to delivering cutting-edge solutions. They are seeking a Full Stack Data Discovery Engineer to design and implement systems that analyze technology usage across various ecosystems, focusing on building data pipelines and dashboards to transform raw data into actionable insights.

Responsibilities:

Own the full stack: Design, build and optimize scalable data pipelines to discover OSINT and software usage across a wide public ecosystem
Pipeline development: Develop APIs, microservices, crawlers, document fingerprinting to gather data securely and efficiently. Implement backoff/caching, data normalization, and persist to SQL/NoSQL indexes
Data Discovery: Conduct systematic searches across the web, public databases, developer ecosystems and other platforms to identify potential external data repositories relevant to organizational objectives
Metadata and Attribution Analysis: Programmatically uncover and analyze metadata associated with identified data sources to understand data structure, content, quality, and potential use cases
Signals & scoring: develop heuristics/ML‑lite ranking to identify relevant artifacts , deduplicate, and assign confidence scores
Data Governance: Ensure data quality, security, compliance and governance
Productize discovery: build internal tools that let non‑engineers run searches, review candidates, and export leads—fast and safely
Documentation and Reporting: Document data structures, origins (data lineage), and quality issues. Create clear, concise reports and presentations to communicate findings and recommendations to technical and non-technical stakeholders
Collaboration: Work closely with data stewards, data architects, and internal business units to define data requirements and facilitate the integration of new data sources
Innovation and Scale: Continuously explore new data sources, improve attribution logic and propose ML-based enhancements to finding and classifying data

Requirements:

Bachelor's degree in Computer Science, Engineering, Library Science, Information Systems, Data Management, or a related field
1-5 years of proven experience as a full-stack developer and data engineer
Back-end: Python, SQL, Java and Node.js
Front-end: Modern JS/TS + React, component libraries, auth patterns, state mgmt
Data & search: schema design, dedup/near‑dup logic, Elasticsearch/OpenSearch; building usable search/triage UIs
Acquisition: Scrapy/Playwright/Puppeteer; API design with rate‑limit/backoff; ethical crawling
Experience with cloud-native architecture and containerization
Familiarity with metadata standards (e.g., Dublin Core, XML) and data management tools
Exceptional attention to detail and strong analytical thinking skills
Excellent written and verbal communication skills, with the ability to translate technical findings into business insights
Strong problem-solving aptitude and the ability to work independently and collaboratively in a fast-paced environment
Master's degree
Knowledge of data visualization tools (e.g. Power BI, Tableau) to present findings
Experience building internal platforms/tools used by end users or GTM teams

Full Stack Data Discovery Engineer

Key skills

About this role

Responsibilities:

Requirements: