Solving Geotechnical Data's 3 Billion Dollar Georeferencing Crisis: How DIGGS and Historic Data Conversions Are Transforming Our Industry
Every day, geotechnical engineers and state transportation departments generate invaluable subsurface data through boring logs, SPT tests, and soil investigations. This data represents millions of taxpayer dollars and decades of engineering expertise. Yet across the United States, we're systematically discarding nearly half of this critical information due to a single, solvable problem: inadequate georeferencing of borehole and site locations.
After processing datasets from more than half of state DOTs and analyzing over 500,000 boring logs in the Geosetta database, we've uncovered a sobering reality about the state of geotechnical data management. The recent integration of Alabama DOT's 24-year project history provides a stark example of this crisis:
The Hidden Crisis: Quantifying Our Data Loss
The numbers from Alabama DOT's recent data integration tell a story that repeats across every state we've processed:
- 970 ALDOT GPJ files processed
- 69,478 total boring log records
- 35,156 successfully georeferenced locations
- 50.7% coordinate extraction success rate
The Implications
What this means in practical terms is devastating: 34,322 boring logs from Alabama DOT alone—representing approximately $102.9 million in taxpayer-funded geotechnical investigations—lack reliable geographic coordinates. Extrapolating this pattern across all 50 states and federal agencies, we estimate that over $3 billion in publicly-funded geotechnical investigations remain effectively unusable due to georeferencing failures nationwide.
Across all state DOTs in our database, we estimate that half or more of their historic geotechnical investigation data remains effectively unusable for regional analysis, machine learning applications, and cross-jurisdictional projects due to georeferencing failures.
This isn't just about lost money—it's about lost opportunities for:
- Regional subsurface modeling and prediction
- Multi-state transportation corridor analysis
- Climate resilience planning using historical data
- Machine learning model training for soil behavior prediction
- Emergency response planning with reliable subsurface intelligence
The Root Cause: Undocumented or Inconsistent Coordinate Reference Systems
Our analysis reveals that the georeferencing crisis stems from four primary issues:
- Project-Specific Stationing Systems: The most common issue involves boring logs referenced only by project stationing (e.g., "STA 125+50, 85' LT") where the stationing baseline is specific to the particular project and the survey control data needed for conversion is not stored with the boring log data.
- Missing Coordinate Reference System (CRS) Codes: Legacy boring logs contain coordinates without specifying their coordinate reference system. A coordinate pair like "567,890.5, 123,456.7" could represent State Plane, UTM, or any number of local coordinate systems.
- Inconsistent CRS Usage: Within a single project, different consultants may use different coordinate systems without proper documentation.
- Format Variations: Coordinates appear in decimal degrees, degrees-minutes-seconds, feet, meters, and various state plane zones without consistent identification.
Consider this real example from our Alabama DOT dataset processing:
<!-- Legacy DIGGS boring log with station/offset reference --> <samplingFeature> <Borehole gml:id="Borehole_IM_I065_390_BH1"> <gml:description>IM-I 065 (390) SOIL SURVEY, STA. 55+00, 85' LT OF SURVEY C/L</gml:description> <gml:name>BH-1</gml:name> <projectRef xlink:href="#p1"> <referencePoint> <PointLocation gml:id="Point_Location_STA_55_00"> <gml:pos>55.0 -85.0 0.0</gml:pos> </PointLocation> </referencePoint> ... </Borehole> </samplingFeature>
This DIIGS XML snippet from a real ALDOT boring log shows the exact problem: coordinates for the borehole reference point are obtained from the the borehole description which are station/offset values (55.0, -85.0) relative to an undefined survey centerline. Without a coordinate reference system reference and the corresponding survey control, a geographic position cannot be determined and thus this boring log cannot be reliably positioned on any map or integrated with other datasets. This represents one of the 34,322 ALDOT boring logs that couldn't be georeferenced—$102.9 million in lost taxpayer-funded investigations.
Since its inception, DIGGS, which is based on Geography Markup Language (GML), has had the capability of storing necessary CRS information in a GML-defined srsName
attribute. However, GML's implementation of srsName
is quite general, which can complicate or hinder interoperability. Specifically:
srsName
is optional, so a DIGGS instance can pass schema validation without a CRS reference, as is the case in the above example.
- the value of
srsName
can be any string that conforms to the anyURI datatype. Without a standardized syntax for thesrsName
value, it may be impossible to determine what CRS is being referenced
srsName
can be specified at the geometry object level (eg. PointLocation above) or at the position level (gml:pos) which complicates extraction of the appropriatesrsName
The above issues have prompted the DIGGS team to implement new guidance for the use of srsName along with a semantic validation tool to enforce the new rules. These new guidelines and tools will be implemented as part of the soon to be released version 3.0 update of DIGGS.
What's Changing in DIGGS 3.0?
The new guidelines include:
- The srsName attribute MUST be used wherever coordinates are reported in a DIGGS instance and located at the geometry object level, or at the topmost geometry object level (for complex geometries)
- The value of srsName MUST directly reference the GML definition of a coordinate reference system. This should be a georeferenced CRS (eg. a geographic, projected, or compound CRS that includes both horizontal and vertical CRS's), OR an Engineering CRS (as might be used for station/offset) whose definition contains sufficient information to transform the local coordinates to a georeferenced CRS.
- Implementation of a new semantic validation tool that includes modules to test and flag non-compliant DIGGS instances if the above conditions are not met.
Specific guidance and examples of srsName usage will be covered in a future blog post, but for cases where georeferenced CRS's are defined in IOGP's EPSG Geodetic Parameter Dataset (https://epsg.org), or by the Open Geospatial Consortium (OGC), we can take advantage of existing CRS resolver web services to return the appropriate GML definition as shown here:
<!-- DIGGS 3.0 with compliant CRS reference --> <samplingFeature> <Borehole gml:id="Borehole_IM_I065_390_BH1"> <gml:description>IM-I 065 (390) SOIL SURVEY, STA. 55+00, 85' LT OF SURVEY C/L</gml:description> <gml:name>BH-1</gml:name> <investigationTarget>Natural Ground</investigationTarget> <projectRef xlink:href="#p1"/> <locality> <Locality gml:id="Locality_ydg_wh3_rgc"> <station uom="ft">55</station> <offset uom="ft">-85</offset> <offsetDirection>left</offsetDirection> </Locality> </locality> <referencePoint> <PointLocation gml:id="Point_Location_STA_55_00" srsDimension="3" srsName="http://www.opengis.net/def/crs-compound?1=http://www.opengis.net/def/crs/EPSG/0/26929%262=http://www.opengis.net/def/crs/EPSG/0/6360"> <gml:pos>654321.45 1234567.89 287.5</gml:pos> </PointLocation> </referencePoint> <centerLine> <LinearExtent gml:id="Linear_Extent_0" srsDimension="3" srsName="http://www.opengis.net/def/crs-compound?1=http://www.opengis.net/def/crs/EPSG/0/26929%262=http://www.opengis.net/def/crs/EPSG/0/6360"> <gml:posList>654321.45 1234567.89 287.5 654321.45 1234567.89 265.0</gml:posList> </LinearExtent> </centerLine> ... </Borehole> </samplingFeature>
The srsName attribute identifies the CRS for the coordinates of the borehole's referencePoint and centerLine properties. The URL specified uses the OGC CRS Definition Resolver to return a GML definition for a 3D compound CRS that includes the horizontal CRS NAD83 / Alabama East (EPSG Code 26929) and the vertical CRS NAVD88 height in ftUS (EPSG Code 6360). The original station/offset information can still be included in the gml:description property and/or the locality property. With this information, processing systems can reliably interpret and convert these coordinates to any other coordinate system, integrate them with regional datasets, and display them on web maps.
Success
This new guidance will revolutionize geotechnical data interoperability. Going forward, our profession can unambiguously georeference any DIGGS file in any coordinate system, enabling:
- Seamless data sharing between jurisdictions using different coordinate systems
- Automatic coordinate conversion for web mapping and GIS applications
- Reliable integration of multi-state project datasets
- Machine learning training on properly georeferenced datasets
Free Coordinate Conversion API: Enabling Interoperability
To support DIGGS 3.0 compliance and enable seamless coordinate system interoperability, we've also developed a comprehensive coordinate conversion API that's freely available through the Geosetta DIGGS Tools platform.
Current API Capabilities
Our coordinate conversion service supports over 6,000 EPSG coordinate reference systems and provides:
- Coordinate Reference System Conversion: Convert between any EPSG coordinate reference systems for cross-jurisdictional projects
- DIGGS 3.0 Compliance Support: Ensure coordinates are in the correct system before creating DIGGS files
- Batch Processing: Convert up to 25 coordinates per request
- US State Plane Finder: Identify the correct EPSG codes for DOT projects by state
- Coordinate Validation: Verify that coordinates fall within expected geographic bounds
- Elevation Integration: Automatic elevation data retrieval using Open Elevation API
Real-World API Usage Examples
Converting Virginia State Plane to WGS84 for Web Mapping:
POST https://diggs.geosetta.org/api/georef/convert
{
"epsg_code": 2283,
"x": 11688443.2,
"y": 6929154.3
}
Response:
{
"success": true,
"original_coordinate": {
"epsg_code": 2283,
"x": 11688443.2,
"y": 6929154.3,
"system_name": "NAD83 / Virginia North (ftUS)"
},
"converted_coordinate": {
"latitude": 38.9234567,
"longitude": -77.5678901,
"elevation_m": 45.2,
"elevation_ft": 148.3
}
}
Batch Processing for Multi-State Highway Project:
POST https://diggs.geosetta.org/api/georef/convert/batch
{
"coordinates": [
{"epsg_code": 2283, "x": 11688443.2, "y": 6929154.3}, // Virginia
{"epsg_code": 2248, "x": 1312456.7, "y": 558934.2}, // Maryland
{"epsg_code": 2272, "x": 2689123.4, "y": 245678.9} // Pennsylvania
]
}
Legacy Data Recovery Process
For datasets like ALDOT's station/offset boring logs, one possible recovery process would involve:
- Survey Control Recovery: Locate original highway alignment and survey control data
- Manual Coordinate Calculation: Transform station/offset to state plane coordinates using survey data
- API Validation & Conversion: Use our API to verify coordinates and convert between systems if needed
- DIGGS 3.0 File Creation: Generate compliant files with proper srsName attributes
ALDOT Success Example: Of the 34,322 boring logs that couldn't be automatically georeferenced, many contained station/offset references like "STA. 55+00, 85' LT OF SURVEY C/L." With access to original survey control data, these could potentially be recovered, converting $102.9 million in investigation value back to usable geospatial data.
Cross-Jurisdictional Project Integration
The API's primary strength is enabling seamless data sharing between jurisdictions using different coordinate systems:
- Virginia DOT provides data in State Plane North (EPSG:2283)
- Maryland DOT uses State Plane coordinates (EPSG:2248)
- API converts all data to WGS84 (EPSG:4326) for web visualization
- Result: Unified I-95 corridor dataset for regional analysis
Multi-State Project Integration
Consider a typical interstate highway improvement project crossing Virginia, Maryland, and Pennsylvania. Previously, each state's boring logs would use different coordinate systems:
- Virginia: State Plane North (EPSG:2283) in US Survey Feet
- Maryland: State Plane (EPSG:2248) in US Survey Feet
- Pennsylvania: State Plane South (EPSG:2272) in US Survey Feet
With DIGGS 3.0 compliance and our coordinate conversion API, the project workflow becomes:
- Data Collection: Each state provides DIGGS 3.0 files with mandatory srsName references
- Coordinate Standardization: All coordinates are automatically converted to WGS84 for web mapping
- Unified Analysis: Regional subsurface modeling uses consistently referenced spatial data
- Visualization: Interactive maps display boring logs from all three states in a unified coordinate system
Getting Started Today
You don't need to wait for DIGGS 3.0's official release to start benefiting from improved georeferencing and coordinate standardization. Here's what you can do right now:
Begin implementing the 3.0 CRS compliance guidance NOW
- Audit Your Data: Review existing boring log databases for coordinate system documentation
- Identify EPSG Codes: Determine which georeferenced coordinate systems your organization commonly uses
- Start adding srsName attributes to geometry elements using the OGC CRS Definition Resolver URL (for standard CRS's with EPSG codes)
- For local and station/offset CRS's, begin building GML CRS definitions that can be referenced by srsName
- Engage with the DIGGS Community: Join monthly DIGGS meetings for further guidance and to stay updated on 3.0 development progress
Try the Coordinate Conversion Tools
Visit diggs.geosetta.org to access our free coordinate conversion tools:
- Test Your Coordinates: Upload sample coordinate pairs and validate conversion accuracy
- Find EPSG Codes: Use our US State Plane finder to identify correct coordinate systems for your projects
- Batch Convert Data: Convert up to 25 coordinates at once to different coordinate systems
- Validate Geographic Bounds: Verify that your coordinates fall within expected regions
Are you a public agency with geotechnical data?
As part of our nonprofit mission to advance geotechnical data accessibility, Geosetta will process and convert your public datasets to DIGGS format at no cost. Contact us to contribute your data to the growing public geotechnical database. We'll standardize coordinates, enhance georeferencing, and make your data available through our open platform.
Looking Forward: The Future of Geotechnical Data
DIGGS 3.0's mandatory georeferencing requirements represent more than a technical upgrade—they signal a fundamental shift toward data-driven geotechnical engineering. With reliable coordinate references, we're enabling:
- Real-Time Field Validation: Mobile applications that instantly validate coordinates against state coordinate systems
- Predictive Subsurface Modeling: Regional models trained on properly georeferenced datasets spanning decades
- Climate Resilience Planning: Historical geotechnical data integrated with climate models for infrastructure adaptation
- Automated Quality Control: AI systems that detect coordinate anomalies and data quality issues
- International Collaboration: Seamless data sharing between countries using standardized coordinate transformations
Join the DIGGS Effort
The transition to mandatory georeferencing using DIGGS 3.0's new validation tools requires industry-wide coordination and support. Here's how you can participate:
- GitHub Contribution: Help develop DIGGS validation tools at https://github.com/DIGGSml/
- Monthly DIGGS Meetings: Join our monthly discussions where we tackle georeferencing implementation challenges. Contact Allen Cadden or Ross Cutts for meeting invites.
- Beta Testing: Volunteer your organization for DIGGS 3.0 pilot testing and coordinate conversion validation.
- Industry Advocacy: Promote DIGGS adoption within your professional networks and state DOT relationships
A Solution for a Crisis
The georeferencing crisis that has plagued our industry for decades finally has a solution. DIGGS 3.0's mandatory srsName requirements, combined with efforts to convert historic data into DIGGS, will recover hundreds of millions of dollars in previously unusable geotechnical data.
This isn't just about better data management—it's about unlocking the full potential of our profession's collective knowledge for safer, more efficient infrastructure development.
Conclusion: A New Era for Geotechnical Data
We can no longer accept that nearly half of our taxpayer-funded geotechnical investigations remain effectively unusable due to coordinate system ambiguity.
DIGGS 3.0 represents our industry's commitment to solving this crisis through standards-based enforcement. The free coordinate conversion tools available today provide immediate relief for existing datasets while preparing the foundation for a fully interoperable geotechnical data ecosystem.
Every boring log that we successfully georeference is a victory against waste, inefficiency, and missed opportunities. Every DIGGS 3.0 compliant file created today is an investment in our profession's data-driven future.
The technology exists. The standards are being finalized. The tools are free and available now.
The question isn't whether we can solve the georeferencing crisis—it's how quickly we can implement the solution.
Join us in making 2026 the year we finally stop throwing away geotechnical data and start building the intelligent, interconnected subsurface intelligence network our infrastructure deserves.
Best regards,
Ross Cutts, P.E., M.ASCE
Schnabel Engineer, Geosetta President, and proud member of the DIGGS team