it编程 > 前端脚本 > Powershell

详解如何利用PowerShell提取wps表格中嵌入的图片

108人参与 2025-02-13 Powershell

针对excel下打开表格图片显示 #name?编辑栏显示为 =@_xlfn.dispimg( 样公式的问题,一般需要在 wps 程序下,ctrl+f 查找范围选值,输入 =dispimg 全选,然后再右键转换为浮动图片。如果是excel中,则是查找公式 dispimg。

查阅网上得资料得知,可以通过解压表格文件,然后 根据公式中的第一参数(通常以 id 开头)看 xl_rels\cellimages.xml.rels 目录下的 name (其值为 dispimg 函数的第一参数)和 r:embed (其值以 rid 开头)的对应关系,然后再看 id ( rid 开头 ) 和 target(图片路径) 的对应关系,进而得到图片的路径。

在 llm 的帮助下进而有以下 ps 脚本。

function get-exceldispimages {
    param (
        [parameter(mandatory=$true)]
        [string]$excelpath,
        
        [parameter(mandatory=$false)]
        [string]$outputfolder = ".\excelimages"
    )

    # 辅助函数:安全地读取文件内容
    function read-filecontent {
        param (
            [string]$path
        )
        try {
            # 使用.net方法直接读取文件,避免powershell路径解析问题
            if ([system.io.file]::exists($path)) {
                return [system.io.file]::readalltext($path)
            }
            write-host "file not found: $path" -foregroundcolor yellow
            return $null
        }
        catch {
            write-host "error reading file $path : $_" -foregroundcolor yellow
            return $null
        }
    }

    try {
        # 验证excel文件是否存在
        if (-not (test-path -literalpath $excelpath)) {
            throw "excel file not found: $excelpath"
        }

        # 确保excelpath是绝对路径
        $excelpath = (get-item $excelpath).fullname

        # 创建输出文件夹(使用绝对路径)
        $outputfolder = [system.io.path]::getfullpath($outputfolder)
        if (-not (test-path -path $outputfolder)) {
            new-item -itemtype directory -path $outputfolder -force | out-null
        }

        # 创建excel com对象
        $excel = new-object -comobject excel.application
        $excel.visible = $false
        $excel.displayalerts = $false
        $workbook = $excel.workbooks.open($excelpath)
        
        # 用于存储找到的dispimg id和信息
        $dispimgids = @()
        $imagemapping = @{}

        # 遍历所有工作表
        foreach ($worksheet in $workbook.worksheets) {
            $usedrange = $worksheet.usedrange
            $rowcount = $usedrange.rows.count
            $colcount = $usedrange.columns.count
            
            for ($row = 1; $row -le $rowcount; $row++) {
                for ($col = 1; $col -le $colcount; $col++) {
                    $cell = $usedrange.cells($row, $col)
                    $formula = $cell.formula
                    
                    # 检查是否包含dispimg函数并提取所有参数
                    if ($formula -match 'dispimg\("([^"]+)"') {
                        $imageid = $matches[1]
                        write-host "found dispimg: $formula" -foregroundcolor gray
                        
                        # 创建参数列表
                        $params = @{
                            'id' = $imageid
                            'cell' = $cell.address()
                            'formula' = $formula
                            'worksheet' = $worksheet.name
                        }
                        
                        # 提取所有参数,包括图片id
                        $paramvalues = @()
                        $formula -match 'dispimg\((.*?)\)' | out-null
                        $paramstring = $matches[1]
                        $paramvalues = $paramstring -split ',' | foreach-object { 
                            $_ -replace '"', '' -replace '^\s+|\s+$', ''
                        }
                        
                        # 存储所有参数
                        for ($i = 0; $i -lt $paramvalues.count; $i++) {
                            $params["param$i"] = $paramvalues[$i]
                        }
                        
                        $imagemapping[$imageid] = $params
                        $dispimgids += $imageid
                    }
                }
            }
        }

        # 创建临时目录
        $temppath = join-path ([system.io.path]::gettemppath()) "exceltemp"
        if (test-path $temppath) {
            remove-item $temppath -recurse -force
        }
        new-item -itemtype directory -path $temppath -force | out-null

        # 复制excel文件到临时目录并解压
        $tempexcel = join-path $temppath "temp.xlsx"
        $tempzip = join-path $temppath "temp.zip"
        copy-item -path $excelpath -destination $tempexcel -force
        if (test-path $tempzip) { remove-item $tempzip -force }
        rename-item -path $tempexcel -newname "temp.zip" -force
        expand-archive -path $tempzip -destinationpath $temppath -force

        # 检查media文件夹并处理图片
        $mediapath = join-path $temppath "xl\media"
        if (test-path $mediapath) {
            # 显示dispimg参数和图片对应关系
            write-host "`nfound $($imagemapping.count) dispimg functions" -foregroundcolor cyan
            
            if ($imagemapping.count -gt 0) {
                write-host "`n=== dispimg functions found ===" -foregroundcolor cyan
                write-host "found $($imagemapping.count) dispimg functions" -foregroundcolor cyan
                
                # 显示所有找到的dispimg函数
                write-host "`n=== dispimg functions details ===" -foregroundcolor yellow
                foreach ($id in $imagemapping.keys) {
                    $params = $imagemapping[$id]
                    write-host "cell: [$($params.worksheet)]$($params.cell)" -foregroundcolor gray
                    write-host "formula: $($params.formula)" -foregroundcolor gray
                }
                
                # 首先从cellimages.xml获取dispimg id到rid的映射
                $dispimgtorid = @{}
                $cellimagespath = join-path $temppath "xl\cellimages.xml"
                write-host "`n=== reading cellimages.xml ===" -foregroundcolor yellow
                write-host "path: $cellimagespath" -foregroundcolor gray
                
                if (test-path $cellimagespath) {
                    try {
                        $xmlcontent = get-content $cellimagespath -raw -encoding utf8
                        write-host "`nraw xml content:" -foregroundcolor gray
                        write-host $xmlcontent
                        
                        # 使用正则表达式提取所有cellimage元素
                        $matches = [regex]::matches($xmlcontent, '<etc:cellimage>.*?</etc:cellimage>', [system.text.regularexpressions.regexoptions]::singleline)
                        write-host "`nfound $($matches.count) cellimage elements" -foregroundcolor gray
                        
                        foreach ($match in $matches) {
                            $cellimagexml = $match.value
                            write-host "`nprocessing cellimage element:" -foregroundcolor gray
                            
                            # 提取name属性(包含dispimg id)
                            if ($cellimagexml -match 'name="([^"]+)"') {
                                $dispimgid = $matches[1]
                                write-host "found dispimg id: $dispimgid" -foregroundcolor gray
                                
                                # 提取r:embed属性(包含rid)
                                if ($cellimagexml -match 'r:embed="(rid\d+)"') {
                                    $rid = $matches[1]
                                    $dispimgtorid[$dispimgid] = $rid
                                    write-host "  mapping: dispimg id $dispimgid -> rid $rid" -foregroundcolor green
                                }
                            }
                        }
                    }
                    catch {
                        write-host "error reading cellimages.xml: $($_.exception.message)" -foregroundcolor red
                        write-host $_.exception.stacktrace -foregroundcolor red
                    }
                }
                else {
                    write-host "cellimages.xml not found!" -foregroundcolor red
                }
                
                # 从cellimages.xml.rels获取rid到实际图片的映射
                $ridtoimage = @{}
                $cellimagesrelspath = join-path $temppath "xl\_rels\cellimages.xml.rels"
                write-host "`n=== reading cellimages.xml.rels ===" -foregroundcolor yellow
                write-host "path: $cellimagesrelspath" -foregroundcolor gray
                
                if (test-path $cellimagesrelspath) {
                    try {
                        [xml]$relsxml = get-content $cellimagesrelspath -raw
                        write-host "`nxml content:" -foregroundcolor gray
                        write-host $relsxml.outerxml
                        
                        $relsxml.relationships.relationship | foreach-object {
                            if ($_.target -match "media/") {
                                $ridtoimage[$_.id] = $_.target
                                write-host "  rid $($_.id) -> $($_.target)" -foregroundcolor green
                            }
                        }
                    }
                    catch {
                        write-host "error reading cellimages.xml.rels: $($_.exception.message)" -foregroundcolor red
                        write-host $_.exception.stacktrace -foregroundcolor red
                    }
                }
                else {
                    write-host "cellimages.xml.rels not found!" -foregroundcolor red
                }
                
                write-host "`n=== summary of mappings ===" -foregroundcolor yellow
                write-host "dispimg id -> rid mappings:" -foregroundcolor gray
                $dispimgtorid.getenumerator() | foreach-object {
                    write-host "  $($_.key) -> $($_.value)" -foregroundcolor gray
                }
                
                write-host "`nrid -> image mappings:" -foregroundcolor gray
                $ridtoimage.getenumerator() | foreach-object {
                    write-host "  $($_.key) -> $($_.value)" -foregroundcolor gray
                }
                
                # 处理每个dispimg函数
                write-host "`n=== processing dispimg functions ===" -foregroundcolor yellow
                foreach ($id in $imagemapping.keys) {
                    $params = $imagemapping[$id]
                    
                    write-host "`nprocessing: [$($params.worksheet)]$($params.cell)" -foregroundcolor green
                    write-host "formula: $($params.formula)" -foregroundcolor gray
                    write-host "dispimg id: $id" -foregroundcolor gray
                    
                    # 从dispimg id查找对应的rid
                    $rid = $dispimgtorid[$id]
                    if ($rid) {
                        write-host "found rid: $rid" -foregroundcolor green
                        
                        # 从rid查找对应的图片路径
                        $imagepath = $ridtoimage[$rid]
                        if ($imagepath) {
                            write-host "found image path: $imagepath" -foregroundcolor green
                            $imagename = split-path $imagepath -leaf
                            
                            # 查找对应的图片文件
                            $mediafile = get-childitem -literalpath $mediapath | where-object { $_.name -eq $imagename }
                            if ($mediafile) {
                                $outputpath = join-path $outputfolder "$id$($mediafile.extension)"
                                copy-item -literalpath $mediafile.fullname -destination $outputpath -force
                                write-host "successfully copied: $imagename -> $outputpath" -foregroundcolor cyan
                           #    以 media 文件夹下的文件名复制
                           #    copy-item  $mediafile.fullname  $(join-path $outputfolder $(split-path $imagepath -leaf))
                            }

                            else {
                                write-host "image file not found: $imagename" -foregroundcolor red
                            }
                        }
                        else {
                            write-host "no image path found for rid: $rid" -foregroundcolor red
                        }
                    }
                    else {
                        write-host "no rid found for dispimg id: $id" -foregroundcolor red
                    }
                }
            } else {
                write-host "no dispimg functions found in the workbook." -foregroundcolor yellow
            }
        }
    }
    catch {
        write-host "错误: $($_.exception.message)" -foregroundcolor red
    }
    finally {
        # 清理
        if ($workbook) {
            $workbook.close($false)
        }
        if ($excel) {
            $excel.quit()
            [system.runtime.interopservices.marshal]::releasecomobject($excel) | out-null
        }
        if (test-path $temppath) {
            remove-item $temppath -recurse -force
        }
        [system.gc]::collect()
        [system.gc]::waitforpendingfinalizers()
    }
}

一个可能的用法如下

get-exceldispimages -excelpath "c:\users\demo\documents\dispimg_wps.xlsx" -outputfolder $pwd\output

生成的图片在当前目录下的 output 文件夹下。

以上就是详解如何利用powershell提取wps表格中嵌入的图片的详细内容,更多关于powershell提取表格中图片的资料请关注代码网其它相关文章!

(0)
打赏 微信扫一扫 微信扫一扫

您想发表意见!!点此发布评论

推荐阅读

利用PowerShell一键下载Nuget某个包的所有版本

02-13

Mongo Shell 执行环境的基本操作

02-13

Xshell远程连接失败以及解决方案

01-14

接入混元AI大模型! 努比亚Watch GT智能手表发布: 599元

01-04

Win11/win10管理PowerShell新方式:支持自动更新

03-03

微软 AI Shell 实测! Win11 命令行效率利器

12-30

猜你喜欢

版权声明:本文内容由互联网用户贡献,该文观点仅代表作者本人。本站仅提供信息存储服务,不拥有所有权,不承担相关法律责任。 如发现本站有涉嫌抄袭侵权/违法违规的内容, 请发送邮件至 2386932994@qq.com 举报,一经查实将立刻删除。

发表评论